[darcs-devel] confused questions about versioning structured data

Fri Aug 23 06:38:15 UTC 2013

Hi Matthias,

Some general thoughts which might provoke further discussion:

- Using a proper VCS seems like the right thing to do particuarly if you
want a history that's branchable and mergeable.

- Darcs lib is still somewhat immature, but it should give you a
programmatic interface to creating and commuting/merging patches if
that's what you want: http://darcs.net/Library

- Tagging every patch is the way I would personally recommend for
identifying specific versions. I don't know of any specific performance
problems that would create.

- Pretty printing to get canonical JSON string would work, but one thing
you would need to think about is whether diffs of those strings would
merge nicely with each other.

- Darcs won't help you much with your external rules about changes
causing other things to change, except that you might encode them with
explicit Darcs patch dependencies.

- You could create your own patch types - it's "just" a matter of
implementing various type classes. That would be more up-front work to
encode your system but might get you something with nicer commute and
merge behaviour.

Hope this is of some help!

Cheers,

Ganesh

On 19/08/2013 16:52, Matthias Fischmann wrote:
> 
> Dear darcs community,
> 
> I had a discussion in a software project that needs versioning on
> structured data (think json objects).  So far, my favorite solution is
> hacking together a restful repository server using darcs and snap, but
> we are still struggling with understanding what we actually need.
> This is a summary of the status of the discussion and some open
> questions, in the hope that at least some of it makes sense.  Thanks
> for listening!  :-)
> 
> The data we are dealing with is content objects of a web application
> written in Python.  There are documents, paragraphs, comments, users,
> groups, authorizations, votes, collections of votes and a lot more.
> 
> Objects evolve along patch trees.  Also, they are associated with each
> other in a graph with different edge types (a document consists of
> several paragraphs; a vote is made by a user on a document).  The type
> of an edge decides what the source will do if the target grows a new
> version (if a user votes on a document and the document is updated,
> the vote relates to the old version; if a paragraph changes in a
> document; the document grows a new version in parallel with the
> paragraph).
> 
> The application will visualize patches, version trees, and arbitrary
> versions of objects.  (We have considered allowing for patches of
> patches, but have abandoned the idea as unnecessarily complex.)
> 
> Now, how would you do this using darcs as a library?
> 
> What would I have to do to process patches on sets of json objects
> rather than file trees?
> 
> What are the drawbacks of using a pretty-printer to get a canonical
> string representation of json objects, write them all to files, and
> use darcs to version-control the files?  (It seems like a weird idea
> to me, but I can only think of performance reasons not to do it.)
> 
> I have rules like this: "If attribute X in object A changes, then the
> object referenced in attribute Y of A must get a new version in which
> it refers to the new version of A."  (This implies that the contents
> of json objects is aware of darcs patches.)  Is darcs offering any
> tools to implement rules like this, or do I have to do this on foot,
> before I present new versions or patches to darcs lib?
> 
> To extract arbitrary versions (instead of patches), I would need to
> either retrieve all patches by timestamp filter, or tag every patch.
> Is either of the two a good idea?  Did I miss a better one?
> 
> If every patch comes with a tag that makes the associated state
> pullable, would that be a performance issue, or would it actually
> mitigate potential performance issues on large patch sets?  My
> uninformed guess is that the situations where complexity bites you
> with darcs involve large, unordered sub-sets of patches.
> 
> In general, do you have any opinion (emotional or rational) whether
> darcslib+snap is right for my use case?  The alternative would be to
> use an object database with linear version history (most likely ZODB)
> and implement additional version control features on top of that.
> 
> Looking forward ot your feedback,
> cheers,
> matthias
> _______________________________________________
> darcs-devel mailing list
> darcs-devel at darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-devel
>