[darcs-devel] confused questions about versioning structured data

Matthias Fischmann mf at zerobuzz.net
Mon Aug 19 15:52:12 UTC 2013


Dear darcs community,

I had a discussion in a software project that needs versioning on
structured data (think json objects).  So far, my favorite solution is
hacking together a restful repository server using darcs and snap, but
we are still struggling with understanding what we actually need.
This is a summary of the status of the discussion and some open
questions, in the hope that at least some of it makes sense.  Thanks
for listening!  :-)

The data we are dealing with is content objects of a web application
written in Python.  There are documents, paragraphs, comments, users,
groups, authorizations, votes, collections of votes and a lot more.

Objects evolve along patch trees.  Also, they are associated with each
other in a graph with different edge types (a document consists of
several paragraphs; a vote is made by a user on a document).  The type
of an edge decides what the source will do if the target grows a new
version (if a user votes on a document and the document is updated,
the vote relates to the old version; if a paragraph changes in a
document; the document grows a new version in parallel with the
paragraph).

The application will visualize patches, version trees, and arbitrary
versions of objects.  (We have considered allowing for patches of
patches, but have abandoned the idea as unnecessarily complex.)

Now, how would you do this using darcs as a library?

What would I have to do to process patches on sets of json objects
rather than file trees?

What are the drawbacks of using a pretty-printer to get a canonical
string representation of json objects, write them all to files, and
use darcs to version-control the files?  (It seems like a weird idea
to me, but I can only think of performance reasons not to do it.)

I have rules like this: "If attribute X in object A changes, then the
object referenced in attribute Y of A must get a new version in which
it refers to the new version of A."  (This implies that the contents
of json objects is aware of darcs patches.)  Is darcs offering any
tools to implement rules like this, or do I have to do this on foot,
before I present new versions or patches to darcs lib?

To extract arbitrary versions (instead of patches), I would need to
either retrieve all patches by timestamp filter, or tag every patch.
Is either of the two a good idea?  Did I miss a better one?

If every patch comes with a tag that makes the associated state
pullable, would that be a performance issue, or would it actually
mitigate potential performance issues on large patch sets?  My
uninformed guess is that the situations where complexity bites you
with darcs involve large, unordered sub-sets of patches.

In general, do you have any opinion (emotional or rational) whether
darcslib+snap is right for my use case?  The alternative would be to
use an object database with linear version history (most likely ZODB)
and implement additional version control features on top of that.

Looking forward ot your feedback,
cheers,
matthias


More information about the darcs-devel mailing list