[darcs-devel] unstable doesn't get consistently with 1.0.3

David Roundy droundy at abridgegame.org
Tue Aug 16 06:01:48 PDT 2005


On Mon, Aug 15, 2005 at 09:21:54PM +0100, Ian Lynagh wrote:
> On Fri, Aug 12, 2005 at 08:24:52AM -0400, David Roundy wrote:
> > The new format will still have the issue, which is that *between* lines
> > we need to include newlines, but *after* the last line we don't want a
> > newline.  So when applying, we need to behave differently, depending on
> > whether our patch is modifying the end of the file.
> 
> I'm not sure what you mean here. In my mind, the new hunk
> 
>     newhunk 8 -3 +6
>     old
>     foo
>     new
>     wibble
> 
> (or equivalent) means "Go to the 8th byte. The next 3 bytes will be
> "foo". Remove them. Insert the 6 bytes "wibble".

Indeed, I was thinking confusedly.  The tricky bit is to decide whether the
newline after the last line of the patch is included in the contents of the
patch itself.  i.e. does

hunk 1
+foo

translate (using very liberal notation) to

newhunk 1 "" "foo\n"

or

newhunk 1 "" "foo"

I think it must translate to the former.

But

hunk 1
-
+foo

translates to

newhunk 1 "" "foo"

not to

newhunk 1 "" "foo\n"

> There are various details you can fiddle with, like whether we note how
> many lines are added/removed, what the starting line number is etc
> (which comes down to "is the extra complexity worth it for how important
> it is for users") but I'd really like a format that is trivial to apply
> and coalesce.

If you leave out the constraint that the character preceding the hunk must
be a '\n' and the hunk must either be followed by EOF or by '\n', what
you've got is a character-hunk patch type, which is not a bad idea, but
also doesn't commute the same way a hunk patch does (for example, a
character-hunk patch could only commute with a replace patch if the first
and last characters of both new and old hunks cannot be part of a token).

I'd rather include the line-number information in the line based hunks.
It's pretty cheap to compute, and allows them to commute easily with
old-fashioned hunks.  Actually, this information is also needed in order to
commute two hunk patches, so it's definitely a good idea to include it
(since we do an awful lot of commutation, and it's better to count the
lines once and store them).

Actually, a nice way to implement your efficient hunk-apply might be to
allow conversion from a line-hunk patch to a character-hunk patch, and then
we'd just use the character-hunk patch version of apply.  That would put
the "trickiness" into a line-to-character patch conversion, which seems to
me like it might be easier to get right.  The character-hunk patch itself
is an interesting feature (and could be quite useful, especially for files
like makefiles).

I think I'd lean towards calling the new patch format "linehunk" (or
"lines"?) and then we could naturally add a similarly formatted "charhunk"
(or "chars"?).  "Binary" patches could also be implemented as a special
case of a "charhunk" patch... but it might be nice to have some annotation
stored to indicate whether it's binary or not, since the patch contents
should be invisible if it's binary, but shouldn't be invisible for a text
charhunk.

Jason wrote:
> I'm probably the last person that should chime in since I've not followed
> the complete discussion about the new hunk format (and if I'm preaching
> to the choir then please just ignore this)... I hope whatever you decide
> on allows for fseek() (or similar) to jump to the relevant parts of the
> patch files.  It sounds like it may be possible given the above format
> (although the offset computation might be a bit hairy, and easily broken
> if the formatting is tweaked).  I just want to stress that I think this
> is important for allowing large patches to be handled efficiently.  I'm
> just envisioning something where darcs can skip from one patch to the
> next without reading anything inbetween from the file.  I guess the next
> step would be to make a tool (as part of darcs or external) that allows
> upgrading from the old patch format?

Right, that's the idea.  We include the text as a raw sequence of
characters and store the number of characters, so we can skip right over
it.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-devel mailing list