[darcs-users] Binary data (& XML)

Jan Scheffczyk jan.scheffczyk at gmx.net
Wed Jun 18 05:58:07 UTC 2003


Hi all,

IMHO there is defnitely a need to handle binary data, especially in 
conjunction with XML.
E.g. Openoffice stores data as zipped XML files.
Some folks at M$ also seem to like the XML idea ;-)

> I haven't gotten around to this largely because I haven't had need of it,
> but also because I haven't decided the best way to deal with binary files.
> For example, I could treat a tar.gz archive as a directory, which would
> provide version control of the files within the archive (assuming they are
> text).

Yes that would definetely help in the office context.
But I'm afraid we need another patch to handle XML files correctly.
Recently I came across the following proposal:

@inproceedings{585073,
 author = {Raymond K. Wong and Nicole Lam},
 title = {Managing and querying multi-version XML data with update logging},
 booktitle = {Proceedings of the 2002 ACM symposium on Document engineering},
 year = {2002},
 isbn = {1-58113-594-7},
 pages = {74--81},
 location = {McLean, Virginia, USA},
 doi = {http://doi.acm.org/10.1145/585058.585073},
 publisher = {ACM Press},
 }

Implementing their XML deltas as patches should be possible in Haskell, making 
use of XML parsers like HaXML, XmlToolbox, or HXML.

In sum, adding support for XML and copressed data would open the road to 
handle office stuff, for which there is a huge market.

> A more normal (and more general) solution would be to introduce binary
> deltas, which would still be a bit of a pain, and much less entertaining.
> It also has the disadvantage that you lose a lot of the benefits of version
> control, since you can't merge binary patches that don't understand their
> content.

Huh, binary patches seem complex to me and I see no real benefit.
Maybe we should simply add a "dummy binary patch" saying

  "replace all content in <oldBinFile> by all content in <newBinFile>"

This would correspond to CVS, which IMHO has no binary diffs and stores 
complete(!) files instead.

> Another interesting type of patch I've considered is an image change patch
> for images files (I'd have to find a good image reading library) that
> change in content but not in size.  In that case you could perform rather
> interesting merges of changes, but while it might be fun to code, I'm not
> sure how useful it would actually be.

Fortunately, some image formats are pure texts, e.g., SVG.

Cheers,
Jan





More information about the darcs-users mailing list