[darcs-devel] Requirements for merging

Sun Dec 12 06:13:40 PST 2004

On Sun, Dec 12, 2004 at 02:51:04PM +1000, Anthony Towns wrote:
> Hi,
> 
> Still trying to grok what merging is about :)
> 
> So, I think there are a range of different things implicit in "merging":
> 
>   a) That when you merge R+B into R+A to get R+A+(B)+C, you can easily 
> (automatically, with no conflicts) merge this back into R+B (to get 
> R+B+(A)+C) and have the same repository.
> 
>   b) That when you're in the process of merging R+A and R+B you end up 
> with a repository/working directory that reflects the changes from R in 
> both A and B, in a form that's easy to navigate and understand, and thus 
> resolve.
> 
>   c) That when you've merged R+A and R+B to get R+A+(B)+C, merging 
> further changes from either branch, such as D from R+A+D to get 
> R+A+(B)+C+(D), is as easy as possible.
> 
> 
> Do those sound reasonable? I'm trying to go from darcs-agnostic first 
> principle's sort of approach. Should there be more, or is there a better 
> way of describing them?

Yes.

> AFAICS:
> 
> darcs gets (a) completely right; if it didn't it probably wouldn't be 
> able to cope with being so decentralised.

Yes, except for the hunk commutation bug that snuck into 1.0.0.  (Grrrr!)

> darcs gets (b) mostly right -- it's fine for conflicting hunks, and 
> conflicting addfile/moves, but breaks down for conflicting 
> addfile/addfiles. Having pretty GUI tools like Bitkeeper does would be 
> even better, of course.
> 
> darcs gets (c) completely right for non-conflicting merges, but 
> completely wrong for conflicting merges.

Sounds about right.  Both (b) and (c) can be fixed without affecting
backwards-compatibility, since they're both a case of marking conflicts
and/or auto-resolving conflicts, both of which processes involve the
creation of a new patch (within the darcs framework), so there are no
backwards compatibility issues.

(c) is an instance of the multiple-conflict problem which when taken far
enough is what causes darcs to hang on an exponential algorithm.  So I
don't expect it to be fixed until I've fixed that problem.

(b) is something people could hack on if they wanted, but I'm not likely to
do so soon, since I'll be working on the multiple-conflict problem.
Thursday I go home for Christmas, and my plan is to work on this while I'm
there.  Hopefully once that is resolved, working out the details of
conflict marking and resolution will be the next thing on the plate.

> I *think* arch/tla gets all of them right, by contrast. The disadvantage 
> arch has (aiui) is that the way it gets them right is by having fuzzy 
> patches, which probably limits it from supporting things like "replace"
> patches.

Actually, arch misses out on (a).  In any case where the three line context
isn't sufficient to unambiguously perform the inexact patching (i.e. files
with repeated content--not all that rare) the order in which patches are
merged matters.

In fact, one could devise an attack where two patches A and B are both
improvements when applied in one order, but introduce a security hole when
applied in the reverse order.  I think with two patches you might need some
luck, but with three patches it's pretty easy.  The trick of course is the
social engineering required to get a developer to apply both patches, and
in a particular order.  It probably would be most effective to send one to
each of two separate developers on a project, since both patches are valid
improvements to the code, and both developers can read them as closely as
they like without seeing a problem, and if they trust each other (and arch)
sufficiently to not look carefully at the result of the non-conflicting
merge of their changes, you've introduced a hole.
-- 
David Roundy
http://www.darcs.net