[darcs-devel] Re: binary patches and diffing

David Roundy droundy at abridgegame.org
Wed Sep 7 04:34:18 PDT 2005


On Tue, Sep 06, 2005 at 07:48:15PM -0700, Bill Trost wrote:
> David Roundy wrote:
>     We've talked about this recently, and I think the consensus was that
>     we wouldn't mind for binary patches to be binary....
> 
> My only concern is the email encoding. "darcs send" currently just spits
> the uncompressed patch to the destination. Would someone have to write a
> MIME converter for binary patches?

Hmmm.  We could have an alternative textified patch format, or we could
modify send to (perhaps optionally?) compress and encode the entire patch
bundle.  The latter solution would probably be simplest.

> > > I added DIFF-ALGORITHM so that "darcs optimize" has an easy way of
> > > deciding if there's something to optimize -- it's not actually needed
> > > for applying the patch.
> > 
> > I think I'd lean against including a diff-algorithm flag.  You're right
> > that it could be useful but it's also something we'd have to live with
> > indefinitely, and I don't like that.  I'd prefer to add a flag to
> > optimize to rediff binary patches and let the user decide.
> 
> Remember, this is supposed to be a generic binary diff format that can
> support a variety of different diff algorithms -- filewide-XOR, xdelta, a
> bsdiff-XOR are all plausible examples. "darcs optimize" needs some way of
> determining whether there's any point in trying to optimize the diff. A
> filewide-XOR could be profitably converted to a xdelta-generated diff,
> but not vice-versa. I don't think we want to force "darcs optimize" to
> generate a new diff for each patch when the patch has already been
> optimized.

By default, optimize won't be recomputing any diffs, and if the user asks
it to rediff their binary patches, it won't hurt to rediff all the binary
patches.  I don't like this flag, since it really has only one use, which
is for optimize, and that's an optional use that I expect to be rarely
used, and never used in the long run (after we have a good default diff).
So it seems silly to clutter up the patch format with it.

> > I think I'd also lean towards sticking the starts and lengths all on
> > the same line with the "binary-diff" and not labelling them.  Or we
> > could label them with just a + and -.
> 
> I don't understand what you're proposing. Could you provide an example?

binary-diff ./foo.bin 10 -0 +50

would insert fifty bytes at byte 10.

> > And don't old-start and new-start have to be identical?
> >
> > binary-diff FILENAME START -OLDLENGTH +NEWLENGTH
> 
> Imagine a tar file in which a new component has been added at the
> front. Excluding tar meta-data, the optimal patch would consist of one
> hunk that adds the new component to the front, and another hunk that
> copies the old files (which started at offset zero, roughly) to the
> offset roughly equal to the length of the old file.
> 
> Then again, I may simply be missing something here. I don't understand
> how the text format works -- how do you invert a patch that only
> contains one line number?

In your example (adding N bytes to the front of the file), one patch will
suffice, which would be

binary-diff ./filename 0 -0 +N

The rest of the file would remain unmodified, since we only added data.  I
think perhaps what you're missing is that patches are applied in sequence.
For example, if we had two patches to a file (where you can interperet the
following vague format as meaning "insert 5 a's"

binary-diff ./filename 1 -0 +5
aaaaa
binary-diff ./filename 7 -0 +5
aaaaa

If the file originally was "123"

Then the final result will be "1aaaaa2aaaaa3".

The two "start" positions can't be different unless you are encoding a
"hunk move" sort of operation, but that's considerably more complicated,
and isn't likely to gain you much in patch compression.  Hunk moves will be
interesting for code, but that's because of how they commute, which will be
complicated and irrelevant for binary patches, which can't commute.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-devel mailing list