[darcs-users] Binary data

BARBOUR Timothy Timothy_BARBOUR at rta.nsw.gov.au
Wed May 21 00:29:37 UTC 2003


> -----Original Message-----
> From: David Roundy [mailto:droundy at abridgegame.org]
[...]
> Well, the idea would be to do a binary delta.  There are some 
> very clever
> algorithms to do so (used, for example, by rsync).  It would 
> be purely a
> space optimization delta, since merging of binary changes 
> isn't possible.

That's interesting. I wonder if any of those algorithms are available in a C
library that darcs could call ?
 
> Darcs can't support third-party merges "internally", since it 
> needs to be
> able to know that the result of a series of merges is 
> independent of the
> order of merges.  However, as a conflict-resolution tool, third-party
> merge tools could be used.

Perhaps it could, if the merge-order independence of patches was specified
as part of the responsibilities of the external tool. In other words, to
work with darcs (now we have done beyond the usual plugins), it would have
to do the same kind of thing darcs would do (or is that something that
cannot be specified for the general case ?).

I have been wondering about the token renaming patches - how does darcs
manage to avoid renaming the contents of a string or comment etc. ? Does it
rely on parsing the language (and presumably works fine for Haskell, but not
other languages) ?

BTW, it is not only binary files that can need special diff and merge tools.
For example, Rational Rose (a CASE tool) stores its models in text files,
but their structure is so complex that they can only be properly merged by
its own model merging tools (which can plug-in to ClearCase). I cannot think
of any other examples, and I think that a tool like Rose is a poor
substitute for a decent language, and language source can be diffed (OT,
have you noticed Curry ?).

> > (i) Extend the patch format to include binary data of 
> substantial size,
> > perhaps using MIME. The message size limits imposed by MTAs 
> need not be a
> > concern - if people really want to push large binary patches through
> > mail, they will have to configure their mail-server 
> accordingly (or solve
> > their problem in a better way). Otherwise, perhaps embedded 
> URLs could
> > eventually be used to satisfy such needs.

> Probably what I should do, although it isn't much fun, and I 
> definitely
> don't want to saddle myself with a patch type that I end up 
> making obsolete
> by introducing a binary delta (since I'd have to keep 
> supporting both patch
> types for all eternity in order for old repositories to 
> remain valid--or I
> suppose if the two were semantically equivalent I could introduce a
> conversion tool).

I see your point here (I forgot that the patches are long-lived). That is
why good software engineers are often slow to add features, since it is
easier to remove a restriction than to impose one. I suppose the question of
semantic equivalence has to do with darcs being able to re-order patches -
in that light a binary diff seems superior to before-and-after images.
Otherwise, it seems to me that before-and-after images resemble a degenerate
binary diff.

> > (ii) Use the above to implement CVS-style handling of 
> binary files (user
> > tells CVS that a file type, or a particular file is binary, then CVS
> > treats it as a black box).
> 
> Actually, I think the user shouldn't tell darcs which files are binary
> except as a last resort.  Darcs should check on its own whether it is
> reasonable to treat the file as text.

Well, CVS can tell by file suffix, provided it is given a list of
appropriate suffixes. That seems to work well in practice. With CVS, there
are always cases where the user must tell it that a file (not matching the
suffixes) is binary, because the file currently only contains ascii
characters (or is empty), but the user knows it will contain binary later.

Tim


IMPORTANT NOTICE:
This e-mail and any attachment to it is intended only to be read or used by
the named addressee.  It is confidential and may contain legally privileged
information.  No confidentiality or privilege is waived or lost by any
mistaken transmission to you.  If you receive this e-mail in error, please
immediately delete it from your system and notify the sender.  You must not
disclose, copy or use any part of this e-mail if you are not the intended
recipient.  The RTA is not responsible for any unauthorised alterations to
this e-mail or attachment to it.  




More information about the darcs-users mailing list