[darcs-users] How to extend a patch theory to fully commute

Mon Jul 6 17:33:55 UTC 2020

> > I don't know how darcs figures out what patches need to be pulled,
> > especially if R1 is lazy.
>
> Don't let yourself be confused by lazy repos.
>
> Conceptually, darcs reads the complete remote repo, figures out the
> common patches, commutes all uncommon patches in both repos to the end,
> and then runs the actual merge algorithm only on the remaining
> (uncommon) sequences. The camp paper has a chapter with many nice
> pictures where this is explained in detail.
>
> In reality we almost never need to read the full remote repo. This is
> because the order of patches in a repositories is stored in so called
> inventories. An inventory is a file that lists patch names along with
> the content hash of each patch. Inventories can be nested and are stored
> hashed, too (i.e. the file name is equal to the content hash). This
> makes it easy to compare two repos efficiently to find the latest point
> at which they diverge. We only need to download remote inventories until
> we find an identical one in our local repo. Since cloning preserves
> inventories and operations deep inside a repo are uncommon (and also
> pretty expensive), this search usually terminates quickly.
>
> Lazy repos are missing only patches listed in closed inventories, which
> means that pulling from a lazy repo normally works fine. However, it
> /can/ fail if we actually need access to such a patch (e.g. in order to
> commute it) and we cannot download it from any source repository listed
> in the local repo (which includes the per-user cache).

Thanks, that was informative. Maybe it could be put on the wiki
somewhere? (I tried to check if something like that already is, but
darcs.net seems to be down.)

> > given a tree, you can turn it into a patch
> > sequence address by doing a depth first search (record positive
> > patches as you go deeper into the tree, and negative patches when you
> > return back upward). I suspect the other direction can be done too,
> > after you extend (Qi) in a patch sequence address to be a cycle and
> > then do some commuting, but I'm not sure.
>
> I guess you want to avoid proper cycles, i.e. cycles other than those of
> the form patches;patches^, but that's just an intuition.

A;A^;B;B^;C;C^ isn't a proper cycle in that sense but may be useful to
be able to have if you're merging or have merged non-commuting patches
A, B, C.

It may be good to avoid non-tree-like cycles like B';A';B^;A^. Maybe
it's possible to avoid creating them in the first place.

> > An advantage of the patch sequence address representation over the
> > tree representation is that I find it easier to think about how
> > commuting can modify a cycle. With the tree representation you'd want
> > to push conflicts as far down the tree as possible. This should be
> > related to the minimality condition for patch sequence addresses, but
> > I haven't wrapped my head around it.
>
> Well, suppose you have primitive A;(B\/C) where neither A;B nor A;C
> commute. If we start with the sequence A;B;B^;C and want to minimize
> that to A;C, we have no other choice but to cancel inverses. Otherwise
> you'd have the situation that A;B;B^;C and A;C are not equivalent as
> context addresses. (I am not using the full notation here because
> primitive contexts are implicit in primitive patches, assuming all
> patches are well-typed, and the sets X and Y are irrelevant here.)

That's an interesting point. Thinking about it more, I think it
completely replaces the existing minimality rule if we switch to a
tree or cycle representation. (You need to be a little careful about
what exactly you can eliminate once everything is a cycle.)

Here are my thoughts so far...

As a motivating example, suppose A;B;C don't commute at all, and we
have the context address (A;B;C, {B}, {A, C}). (I'm leaving out the
first two parts of the tuple, since as you point out they are
implied.)

>From the old point of view, we can delete C because it's in Y and
already at the end of the sequence, producing the minimal context
address (A;B, {B}, {A}).

But another way to do it is this: first extend the sequence to the
cycle A;B;C;C^;B^;A^. The address becomes (A;B;C;C^;B^;A^, {A^, B,
C^}, {A, B^, C}). (See below for some intuition about why I extended
the sets X and Y in that way.) Then, we eliminate C;C^ as a
consecutive pair. However, we have to be careful: if we just always
eliminated consecutive primitive patches we'd end up with an empty
sequence.

Viewed as a tree, this is going to look like the following: there is a
line A;B;C with three edges, and each edge has an orientation. The "A"
edge points left (away from B and C), the "B" edge points right, and
the "C" edge points left. We delete the C edge following a general
rule that leaves can be deleted when they are connected by edges that
point to the rest of the tree.

  o--<--o-->--o--<--o
     A     B     C

  o--<--o-->--o
     A     B

If all of a tree's edges point inward toward one "root" node, then all
the edges can be deleted and the tree just represents the primitive
context at its root.

Here's how it could work in general.

* A context address is a tuple (S, X, Y), where S is a "tree-like"
cycle. (I used to use the notation (Qi) but that's getting annoying.)

(Tree-like means following the cycle is the same as doing depth-first
search on some tree, or in other words, for every patch P with
start/end contexts a and b in S, the patch P^ appears with start/end
contexts b and a.)

* (S, X, Y) can be thought of as a rooted tree with an orientation on
each edge, with the following transformation.

Put the root at the start/end context of the cycle S. Draw a tree by
following all the patches in S. Each edge of the tree consists of a
pair P and P^. Orient the edge as pointing away from the root if P is
in X and P^ is in Y, and toward the root if P is in Y and P^ is in X.

* We can rotate S to produce an equivalent address. When we rotate,
patches move between X and Y. Specifically, if patch P is moved from
the start to end of S or vice versa, P and P^ switch places. In other
words, (P;S', {P} U X', {P^} U Y') becomes (P;S', {P^} U X', {P} U
Y').

>From the tree point of view, this preserves the orientation of all the
edges. All that happens is the root moves.

So, we can forget where the root was and just think of it as an unrooted tree.

* If P;P^ appear consecutively in S, and P is in Y and P^ is in X,
then they can be eliminated. Similarly if P^;P appears, P^ is in Y and
P is in X.

In other words, we can delete a leaf from a tree so long as its edge
points away from the leaf.

* More transformations on S should be allowed. I'm a bit fuzzy about
this point, For example, I'm pretty sure if S = T + U where T and U
are cycles, you should be able to swap the order U + T. And I guess
you should be able to reverse S, and also to do any commuting the
primitive patch theory allows. But I'm not sure whether that's enough,
and whether it's okay to restrict S to be tree-like during the
intermediate steps. It is probably simpler to just drop the cycle
version and think only of trees, but I want to make sure I understand
how the primitive patch theory's commuting rules turn into tree
transformations.

* If you want to extend this to extended patches / patch sequences,
you can allow three kinds of edge instead of two: an edge {u, v} can
be oriented as (u, v), or as (v, u), or it can be part of the patch
sequence (no orientation). There's an ambiguity here: this gives a
patch (sequence) the same representation as its inverse. In practice
we may only be interested in patch sequences that represent entire
repositories (starting context = empty repo) so this ambiguity might
not be a problem --- just root the tree at the known starting context.

--
James