[darcs-users] Binary data

David Roundy droundy at abridgegame.org
Wed May 21 11:03:33 UTC 2003


On Wed, May 21, 2003 at 10:29:37AM +1000, BARBOUR Timothy wrote:
> > From: David Roundy [mailto:droundy at abridgegame.org]
> [...]
> > Well, the idea would be to do a binary delta.  There are some very
> > clever algorithms to do so (used, for example, by rsync).  It would be
> > purely a space optimization delta, since merging of binary changes
> > isn't possible.
> 
> That's interesting. I wonder if any of those algorithms are available in
> a C library that darcs could call ?

There is librsync, which I think does what we'd want, but looks like it's
not particularly well documented.

> > Darcs can't support third-party merges "internally", since it needs to
> > be able to know that the result of a series of merges is independent of
> > the order of merges.  However, as a conflict-resolution tool,
> > third-party merge tools could be used.
> 
> Perhaps it could, if the merge-order independence of patches was specified
> as part of the responsibilities of the external tool. In other words, to
> work with darcs (now we have done beyond the usual plugins), it would have
> to do the same kind of thing darcs would do (or is that something that
> cannot be specified for the general case ?).

The problem is that then any bug in those external tools could result in
corruption of the archive... while it is possible to recover from such a
situation, it is far from pleasant.  Moreover, it would mean that the
external tools would have to be able to deal with things like five-way
merges, which seems to me to be requiring a bit much of them.

> I have been wondering about the token renaming patches - how does darcs
> manage to avoid renaming the contents of a string or comment etc. ? Does it
> rely on parsing the language (and presumably works fine for Haskell, but not
> other languages) ?

Actually, a token replace will happily modify comments and strings (I
suspect you'd want to modify at least the comments anyways...).  A token is
defined as any contiguous series of a given set of characters.  This
doesn't exactly map to either haskell or C tokens, as (for example) a C
token can end but not begin with a digit, and haskell behaves similarly
with single quotes.  Although the current interface doesn't support this,
the patch supports a regexp of the [a-z0-9'] style to specify the
characters allowed in a token.

> BTW, it is not only binary files that can need special diff and merge
> tools.  For example, Rational Rose (a CASE tool) stores its models in
> text files, but their structure is so complex that they can only be
> properly merged by its own model merging tools (which can plug-in to
> ClearCase). I cannot think of any other examples, and I think that a tool
> like Rose is a poor substitute for a decent language, and language source
> can be diffed .

I think it is better to have anything that requires knowledge of the
content of a file be a conflict resolution tool than a merge tool... to the
user this probably looks pretty much the same, but to darcs it is a totally
different beast (and much simpler).

> (OT, have you noticed Curry ?)

No.
-- 
David Roundy
http://www.abridgegame.org




More information about the darcs-users mailing list