[darcs-devel] conflict resolution: an efficient algorithm

Mon May 14 13:30:24 UTC 2018

Am 14.05.2018 um 11:58 schrieb Ben Franksen:
> Given a conflictor representing the primitive patch q at the end of a
> repo r, define the transitive set of conflicting (primitive) patches
> C(q) as the smallest set of patches in r such that
> 
>  1) q `elem` C(q)
>  2) if p `elem` C(q) and s conflicts with p then s `elem`C(q)
> 
> The notation here is a bit lax: I use the same name for a primitive
> patch and the catch/repopatch/conflictor representing it. Furthermore,
> names for primitive patches are supposed to ignore changes due to
> commutation: if (o,p) <-> (p',o') I regard p and p' as equivalent and
> use the same letter p to denote both. That is, p stands for a whole
> equivalence class of patches.
> 
> It is assumed here that the semantics of conflictors is such that for
> each p in C(q) we can construct a context for p, that is, there is a
> subsequence c⁻¹ of E(r) such that E(r) :> c :> p is well-typed and c is
> minimal in the sense that none of its elements can be commuted past p.
> Here, E(r) is the effect of the repo r.
> 
> Now, form the set M(q) of all maximal non-conflicting subsets of C(q).
> "Non-conflicting subset" means that for any two elements s and t of the
> subset, the sequences consisting of s and t, each together with their
> contexts, cleanly merge.
> 
> The set R(q) of sequences of primitive patches resulting from merging
> each maximally non-conflicting subset in M(q) is called the /conflict
> resolution/ of q. R(q) is uniquely determined up to commutation.

The spec as I gave it suggests that calculating R(q) from C(q) might be
exponential in the worst case. This is not so.

Let us start with an estimate of how costly the merges are. We assume
that commuting two primitive patches takes a constant amount of work, as
does inverting a primitive patch. Checking whether two prim patches
cleanly merge requires at most two inversions and one commute, so is
also constant. Merging two sequences of prim patches of length n and m
requires n*m primitive merges in the worst case (which is when the merge
succeeds).

Now remember that, up to commutation, contexts are (inverted)
subsequences of the effect of the repo. The patches that can occur in
any p in C(q) or its context must be a subset of E(r)⁻¹ up to
(including) the "oldest" patch contained in C(q). We define depth(q) to
be the size of this sequence. It is an upper bound on the length of any
merge of a non-conflicting subset of C(q).

The algorithm is as follows:

Pick any element s of C(q) (s is a sequence of prim patches). Try to
merge it with all other elements of C(q). The result is two new sets:
C0(s,q) contains all elements of C(q)\{s} that conflict with s, while
C1(s,q) contains the result of the merge for those that do not conflict
with s. If C1(s,q) is empty, then {s} is already maximal and we are done
with C1(s,q). If it has only one element, then this element is maximal
and we are also done with C1(s,q). Otherwise any maximal non-conflicting
subset of C(q) that contains s must also contain at least one of the
elements of C1(s,q) and none of the elements of C0(s,q). Suppose we have
u and v in C(q)\{s} that both do not conflict with s, let su and sv be
the result of the merges with s, then su and sv conflict if and only if
u and v conflict. Thus any maximal non-conflicting subset that contains
s must be among the results of solving the sub-problem for C0(s,q). On
the other hand, any maximal non-conflicting subset of C(q) that does
/not/ contain s must be a subset of C0(s,q). So we can recursively solve
the sub-problems for both C0(s,q) and C1(s,q) and the result is the
(disjoint) union of the two sub-results.

The algorithm clearly needs less than n*(n-1) merges where n = |C(q)|:
the first step needs n-1 merges, the second step needs (n-1-k-1) + (k-1)
= n-3 merges, etc, and there are at most n-1 steps. And as we saw above,
each merge is bounded by 3*depth(q)². Since we always have n <=
depth(q), we can guarantee that conflict resolution of q is bounded by
depth(q)⁴.

These bounds are rather pessimistic. The nearer n comes to depth(q), the
smaller the contexts will be. This is because contexts build up due to
patch dependencies, and conflictors always commute with the patches they
conflict with. So as n approaches depth(q), the contexts approach length
zero.

The algorithm above also gives us an estimate for the size of the result
set R(q). I claim that |R(q)| <= n = |C(q)|. For n = 2 this is obvious.
For n > 2, chose a pivot s and distinguish two cases: either C1(s,q) is
empty, then we add {s} to the result of the recursive solution for
C0(s,q) which by the induction hypothesis has no more than (n-1)
elements. Otherwise, we add no element and the result is the (disjoint)
union of the results of the two sub-problems. Since their sizes sum up
to (n-1), by induction we get no more than n-1<=n elements in the final
result.

This bound is attained for instance if all the patches in C(q) conflict
with each other.