[darcs-devel] Alternative conflict resolution scheme.

Wed Jan 16 23:11:19 UTC 2008

On Wed, Jan 16, 2008 at 02:13:56AM +0100, Petr Rockai wrote:
> Hi!

Hi Petr,

I've looked over your email (and enjoyed it--but not by laughing), and
found a few places where I think there are problems.

(cutting bits I'm not commenting on)

> The step, where user selects "taken" and "cancelled" branches would
> record a patch of kind "R" (as in resolution), with this shape:
> 
>     R({A,B,...},C)
> 
> Where the set represents the cancelled branch (which we will see later
> can contain more than a single patch, but it is probably better
> thought of as a set than a list) and C represents the taken branch
> (which is necessarily a single patch only). It should be probably
> noted, that we record "resolution" patch for each "primitive"
> conflict, not a big single resolution patch.
> 
> Normally, darcs handles conflicts by not applying any of the patches
> involved in conflict. Say, A and B are in conflict here:
> 
>     A B
> 
> neither of A and B have any effect on working copy. If we add a resolution:
> 
>     A B R({A},B)
> 
> (B was selected as the taken branch), the R here has same effect as B
> (and thus the whole sequence, since A B alone have no effect at all).
> 
> Fairly obviously, B R(_,B) fail to commute. For
> 
>     A R(x,_) where A `elem` x
> 
> we cannot fail the commute, since we want the
> 
>     inv(R(x,_)) A
> 
> commute to succeed (I think this is a requirement of patch theory, but
> even if it was not, we will need it). Therefore, we define a new patch
> type to be a result of these commutes:
> 
>     A R(x,B) <-> R(x - A,B) D(A,B)

I don't think we actually want either of these commutes to succeed.  What
this commute means (if it succeeds) is that the resolution patch does not
depend on the conflicting patch, which is highly counter-intuitive (at
least to me).  Making

  A R(x,_) where A `elem` x

fail to commute should eliminate the D-style patch, which (I think) will
greatly simplify the picture.

Why is it that you think we'll need for the commute of

   inv(R(x,_)) A

to succeed?

For the moment, I'm going to skim over the discussion of D-patch
commutation, since I don't see why we would want to allow these patch
types.

I should perhaps point out that in darcs (either darcs-1 or darcs-2),
merging of conflicted patches does *not* occur by a commutation, since
it'll affect things below.

> The D-style patch carries both A (the cancelled patch) and B (which
> identifies the R-patch which caused the disablement). The D(_,B) patch
> has no effect and commutes freely with all non-R(_,B) patches (since
> it has no effect, ie. is a noop patch).
> 
> Now, we need the inverse-resolution commutation for D patches:
> 
>     inv(R(x,B)) A <-> D(A,B) inv(R(x \cup A,B))
>         iff
>             A fails to commute with any member of the set x
>         and
>     inv(R(x,B)) A <-> A inv(R(x,B))
>         otherwise.
> 
> There are now two possibilities with regards to D patches: we can
> either disallow their persistence, by forcing them to commute past
> their respective R-patch, which turns them into non-D patches again,
> or we could possibly try to retain them and define their inverses and
> commutations (although this is not required, since their inverses
> never actually exist if we require them to be commuted past the
> R-patch).
> 
> With "persistent D patches", we would get:
> 
>     inv(D(A,B)) X <-> D(X,B) inv(D(A,B))
>         iff
>             X and A fail to commute
>         and
>     inv(D(A,B)) X <-> X inv(D(A,B))
>         otherwise.
> 
> We see that the D-patches can effectively encode the same information
> that the "cancelled set" in R-patches, but the R-patches have to have
> this information and have to commute with D-patches as described above
> regardless of this, which makes it favourable to remove D patches by
> commutation and not ever store (or invert) them.

Here's an interesting test case below: (but I'll leave my comments for
after you come to your conclusion below)

> Now, take that scenario we had above:
> 
>     A B R({A},B)
> 
> and let's say, we want to merge C, from a repository:
> 
>     A C (where C depends on A)
> 
> we first get:
> 
>     A B R({A},B) inv(R({A},B) inv(B) C
> 
> now we need another commute, namely
> 
>     inv(R({A},B)) inv(B) <-> inv(B) inv(R({A},B))
> 
> (Sidenote: I spot a problem here, since B R(_,B) fail to commute, but
> inv(R(_,B)) inv(B) is required to commute and I am not sure what
> problems is this going to cause elsewhere. This is required to get C
> past inv(B) safely, even though they fail to commute by themselves.)
> 
> There, we get:
> 
>     A B R({A},B) inv(B) inv(R({A},B) C
> 
> now, we use the inv-R commutation:
> 
>     inv(R(x,B)) A <-> D(A,B) inv(R(x \cup A,B))
> 
> to get:
> 
>     A B R({A},B) inv(B) D(C,B) inv(R({A,B},B)
> 
> since D commutes freely, we now get past inv(B):
> 
>     A B R({A},B) D(C,B) inv(B) inv(R({A,B},B)
> 
> and the R/D commute gives:
> 
>     A B C R({A,C},B) inv(B) inv(R({A,B},B)
> 
> and provides a merge:
> 
>     A B C R({A,C},B)
> 
> which is a non-conflicted repository again. This demonstrates how the
> cancellation works and how it propagates to patches that depend on the
> cancelled branch.

I believe this will be a real problem.  Much better behavior would be to
have the merging of C have the result that the conflict is re-invigorated.
Having the change C silently disappear is likely to be confusing to users,
who are now pulling a separate change distinct from the one they previously
"resolved", and find that pulling this change has no effect, and doesn't
involve a conflict either!

I should point out that you've got a considerably-simplified notation which
ignores conflicts (i.e. you're using B to describe the primitive change B
as well as the conflicted patch B that has been merged with the conflicting
change A).  This obscures some of the behavior.

I would write the repository containing A and B as

A X([A] B)

where the bit in parentheses indicate that it is the change B which has
conflicted with the change A.

then the repository with A, B and C (but no resolution) would look like

A X([A] B) X({A,B}A:C)

where the second conflict is more confusing:  it indicates that C is
in a conflict involving A and B, and C depends on A.  And that A and B have
both already conflicted (with each other), so we don't need to reverse
their effects.  (Note: mentioning A twice in X({A,B}A:C) is indeed
redundant, but seems to make certain operations easier.)

At this point, we can imaging merging *this* X({A,B}A:C) with your
R({A}B).

The merge could put them in either order, and a commute must swap between
those orders.

I'd vote for a merge result that looks something like:

A X([A]B) X({A,B}A:C) X({A:C} R({A}B))

which is to say that the C patch should conflict with the resolution
patch.  I think this will be nicer behavior for our users.  Then a commute
would need to turn this into something like:

A X([A]B) R({A}B) X([R({A}B)] A:C)

But I haven't worked out how this would all work.

> It is however required, that conflicting resolutions conflict again:
> 
>     A B R({A},B) R({B},A)
> 
> is a conflicted repository.
>
>     R(x,A) R(y,B) fail to commute if B `elem` x
> 
> We therefore get R-patches that have R-patches as their components:
> 
>     R(x,R(y,A))
> 
> This looks problematic, since we see that the resolution patches grow,
> but there is a big advantage over current mechanism in that such
> patches only happen, when you do repeated pulls both ways and do
> opposed conflict resolutions. This is a kind of conflict fight, where
> both sides of the conflict pull the other side's resolution and
> re-resolve:
> 
>     A B R({A},B) R({B},A) R({R({A},B)},R({B},A)) R({R({B},A)},R({A},B)) 
> 
> et cetera ad infinitum. This is however a pathological case, since if
> a both-way pull happens, the sides should be able to negotiate a
> common resolution for the conflict in a single iteration (or in zero
> iterations). As in, oh, you have resolved the conflict other way
> around than myself, I should unpull my resolution and use yours (or
> maybe our branches should diverge instead and I unpull your resolution
> and hack on, not pulling your resolution nor any patches that depend
> on it until it is time to join the branches again).
>
> The mechanism (if it works at all) definitely fixes the unilateral
> conflict fight by not producing a new conflict when a new patch is
> pulled that depends on a conflicted patch (it also makes life easier
> for the conflicted downstream).

That's definitely appealing.

> Now, in defining the R-commutations, I have left out an important
> case, where the in R(x,A), either A is a R-patch itself, or x contains
> R-patches (well, it is probably necessary, that either *both* happen
> or *neither*, since there is no way a resolution-nonresolution
> conflict could arise).

I'll note here (again) that I think it's very important for a
non-resolution to be able to conflict with a resolution.  Making an *old*
conflict resolution to cause *new* changes to disappear is just asking for
confusion.

> Now, when commuting C past such a composed
> R-patch, the inner R-patches need to be adjusted to add C to their
> cancelled sets if appropriate. Another possibility, which may be more
> elegant (and more efficiently implemented) would be to always use
> empty cancelled set for nested R patches:
> 
>     A B R({A},B) R({B},A) R({R({},B)},R({},A)) R({R({},A)},R({},B))
> 
> which reduces the size of the patches and saves us trouble with
> special commutations required for the other approach (the R-patch is
> entirely defined by the taken-branch, the cancelled-branch is only
> required to facilitate disablement of patches depending on a cancelled
> patch, through the inv-R commutation). It may be beneficial, to add
> the cancelled-set of taken-resolution to the cancelled-set of the
> meta-resolution as well (I am probably getting muddy in
> terminology...):
> 
>     A B R({A},B) R({B},A) R({B,R({},B)},R({},A)) R({A,R({},A)},R({},B))
> 
> but I am not quite in a state where I can think about the details (it
> is 2 am already and I just intended to sum up the ideas and it is
> getting out of hand).
> 
> (Hm, the assumption, that a R-patch is entirely defined by its
> taken-branch may be mistaken. What are the implications of this? Do we
> need to make that set a list and treat its first item as a part of the
> patch identity (for purposes of nesting)? Consider the scenario where:
> A and B conflict, there is an R({A},B) resolution, inv(B) and C, then
> we pull R({B},A) inv(A) C... what happens here? Probably C is
> doppleganger and is therefore treated as just a single C without
> conflicts. The assumption may be valid after all.)
> 
> There are still some blurry spots in there and I may have holes in the
> theory, but I think I currently do not see any showstoppers, so please
> try to help test and debug the theory if you can -- it would be
> greatly appreciated.
> 
> Yours,
>     (half-asleep-by-now) Peter.
> 
> PS: Please do not laugh at me if I have made some obvious stupidity. I
> have tried and I am fairly new to patch theory and all...
> 
> r- 
> Peter Rockai | me()mornfall!net | prockai()redhat!com
>  http://blog.mornfall.net | http://web.mornfall.net
-- 
David Roundy
Department of Physics
Oregon State University