[darcs-devel] Alternative conflict resolution scheme.

Wed Jan 16 01:13:56 UTC 2008

Hi!

We have been doing some brainstorming in #darcs about a possible
alternative conflict resolution scheme. I will try to summarise it for
the list, and it would be very welcome, if you could find problems
with it.

I will start with the easy bit: UI. We would, when recording a patch
in a conflicted repository, if the would-be patch interacts with the
conflict, ask the user to select either branch of the conflict which
would be "taken" and the other would be "cancelled". The patch would
then be recorded on top of the taken branch, ignoring the cancelled
branch. This adds the extra step of, when doing a resolution,
selecting the "winning" side of the conflict and then adjusting that,
instead of working on top of both-branches-cancelled scenario.

Now about the implementation (in terms of patch theory) mechanics:

The step, where user selects "taken" and "cancelled" branches would
record a patch of kind "R" (as in resolution), with this shape:

    R({A,B,...},C)

Where the set represents the cancelled branch (which we will see later
can contain more than a single patch, but it is probably better
thought of as a set than a list) and C represents the taken branch
(which is necessarily a single patch only). It should be probably
noted, that we record "resolution" patch for each "primitive"
conflict, not a big single resolution patch.

Normally, darcs handles conflicts by not applying any of the patches
involved in conflict. Say, A and B are in conflict here:

    A B

neither of A and B have any effect on working copy. If we add a resolution:

    A B R({A},B)

(B was selected as the taken branch), the R here has same effect as B
(and thus the whole sequence, since A B alone have no effect at all).

Fairly obviously, B R(_,B) fail to commute. For

    A R(x,_) where A `elem` x

we cannot fail the commute, since we want the

    inv(R(x,_)) A

commute to succeed (I think this is a requirement of patch theory, but
even if it was not, we will need it). Therefore, we define a new patch
type to be a result of these commutes:

    A R(x,B) <-> R(x - A,B) D(A,B)

The D-style patch carries both A (the cancelled patch) and B (which
identifies the R-patch which caused the disablement). The D(_,B) patch
has no effect and commutes freely with all non-R(_,B) patches (since
it has no effect, ie. is a noop patch).

Now, we need the inverse-resolution commutation for D patches:

    inv(R(x,B)) A <-> D(A,B) inv(R(x \cup A,B))
        iff
            A fails to commute with any member of the set x
        and
    inv(R(x,B)) A <-> A inv(R(x,B))
        otherwise.

There are now two possibilities with regards to D patches: we can
either disallow their persistence, by forcing them to commute past
their respective R-patch, which turns them into non-D patches again,
or we could possibly try to retain them and define their inverses and
commutations (although this is not required, since their inverses
never actually exist if we require them to be commuted past the
R-patch).

With "persistent D patches", we would get:

    inv(D(A,B)) X <-> D(X,B) inv(D(A,B))
        iff
            X and A fail to commute
        and
    inv(D(A,B)) X <-> X inv(D(A,B))
        otherwise.

We see that the D-patches can effectively encode the same information
that the "cancelled set" in R-patches, but the R-patches have to have
this information and have to commute with D-patches as described above
regardless of this, which makes it favourable to remove D patches by
commutation and not ever store (or invert) them.

Now, take that scenario we had above:

    A B R({A},B)

and let's say, we want to merge C, from a repository:

    A C (where C depends on A)

we first get:

    A B R({A},B) inv(R({A},B) inv(B) C

now we need another commute, namely

    inv(R({A},B)) inv(B) <-> inv(B) inv(R({A},B))

(Sidenote: I spot a problem here, since B R(_,B) fail to commute, but
inv(R(_,B)) inv(B) is required to commute and I am not sure what
problems is this going to cause elsewhere. This is required to get C
past inv(B) safely, even though they fail to commute by themselves.)

There, we get:

    A B R({A},B) inv(B) inv(R({A},B) C

now, we use the inv-R commutation:

    inv(R(x,B)) A <-> D(A,B) inv(R(x \cup A,B))

to get:

    A B R({A},B) inv(B) D(C,B) inv(R({A,B},B)

since D commutes freely, we now get past inv(B):

    A B R({A},B) D(C,B) inv(B) inv(R({A,B},B)

and the R/D commute gives:

    A B C R({A,C},B) inv(B) inv(R({A,B},B)

and provides a merge:

    A B C R({A,C},B)

which is a non-conflicted repository again. This demonstrates how the
cancellation works and how it propagates to patches that depend on the
cancelled branch.

It is however required, that conflicting resolutions conflict again:

    A B R({A},B) R({B},A)

is a conflicted repository.

    R(x,A) R(y,B) fail to commute if B `elem` x

We therefore get R-patches that have R-patches as their components:

    R(x,R(y,A))

This looks problematic, since we see that the resolution patches grow,
but there is a big advantage over current mechanism in that such
patches only happen, when you do repeated pulls both ways and do
opposed conflict resolutions. This is a kind of conflict fight, where
both sides of the conflict pull the other side's resolution and
re-resolve:

    A B R({A},B) R({B},A) R({R({A},B)},R({B},A)) R({R({B},A)},R({A},B)) 

et cetera ad infinitum. This is however a pathological case, since if
a both-way pull happens, the sides should be able to negotiate a
common resolution for the conflict in a single iteration (or in zero
iterations). As in, oh, you have resolved the conflict other way
around than myself, I should unpull my resolution and use yours (or
maybe our branches should diverge instead and I unpull your resolution
and hack on, not pulling your resolution nor any patches that depend
on it until it is time to join the branches again).

The mechanism (if it works at all) definitely fixes the unilateral
conflict fight by not producing a new conflict when a new patch is
pulled that depends on a conflicted patch (it also makes life easier
for the conflicted downstream).

Now, in defining the R-commutations, I have left out an important
case, where the in R(x,A), either A is a R-patch itself, or x contains
R-patches (well, it is probably necessary, that either *both* happen
or *neither*, since there is no way a resolution-nonresolution
conflict could arise). Now, when commuting C past such a composed
R-patch, the inner R-patches need to be adjusted to add C to their
cancelled sets if appropriate. Another possibility, which may be more
elegant (and more efficiently implemented) would be to always use
empty cancelled set for nested R patches:

    A B R({A},B) R({B},A) R({R({},B)},R({},A)) R({R({},A)},R({},B))

which reduces the size of the patches and saves us trouble with
special commutations required for the other approach (the R-patch is
entirely defined by the taken-branch, the cancelled-branch is only
required to facilitate disablement of patches depending on a cancelled
patch, through the inv-R commutation). It may be beneficial, to add
the cancelled-set of taken-resolution to the cancelled-set of the
meta-resolution as well (I am probably getting muddy in
terminology...):

    A B R({A},B) R({B},A) R({B,R({},B)},R({},A)) R({A,R({},A)},R({},B))

but I am not quite in a state where I can think about the details (it
is 2 am already and I just intended to sum up the ideas and it is
getting out of hand).

(Hm, the assumption, that a R-patch is entirely defined by its
taken-branch may be mistaken. What are the implications of this? Do we
need to make that set a list and treat its first item as a part of the
patch identity (for purposes of nesting)? Consider the scenario where:
A and B conflict, there is an R({A},B) resolution, inv(B) and C, then
we pull R({B},A) inv(A) C... what happens here? Probably C is
doppleganger and is therefore treated as just a single C without
conflicts. The assumption may be valid after all.)

There are still some blurry spots in there and I may have holes in the
theory, but I think I currently do not see any showstoppers, so please
try to help test and debug the theory if you can -- it would be
greatly appreciated.

Yours,
    (half-asleep-by-now) Peter.

PS: Please do not laugh at me if I have made some obvious stupidity. I
have tried and I am fairly new to patch theory and all...

r- 
Peter Rockai | me()mornfall!net | prockai()redhat!com
 http://blog.mornfall.net | http://web.mornfall.net

"In My Egotistical Opinion, most people's C programs should be
 indented six feet downward and covered with dirt."
     -- Blair P. Houghton on the subject of C program indentation