[darcs-users] Write-up on "tree repositories" as an alternative to conflictors

Sat Dec 12 10:56:56 UTC 2020

Hi James,

there are a host of other problems with the idea of definining (clean)
merges via pushouts. Here is a very regular, completely non-pathologic
example: merging the singleton sequence a with the sequence b;c, where c
depends on b. (Paint yourself a picture with a rectangle consisting of
two commuting squares.) If a\/b merges cleanly to b'/\a', and similarly
a'\/c to c'/\ a'', then it is *not* the case that the alternative merge
candidate for a\/b, namely b'c'/\ca'' factors through a common path.
This would only be true if we identified parallel paths, so we could
replace ca'' with a'c'.

This makes me think that it makes more sense to use the property

  name(pq') = name(p) U name(q) = name (qp')

directly in the definition of the merge of p\/q.

The definitions I proposed also allow far more pathological examples
than the ones you gave. For instance, I haven't excluded the possibility
that the same name appears more than once along a path. So a universe of
two patches and three contexts a:x->y, b:y->z, with name(a)=name(b) is
possible. One could of course add another axiom that requires |name(p)|
= length(p) for all paths p.

Yet another pathology is patches with the same source and name but
different target, or similarly patches with the same target and name but
different source.

Can we find additional axioms to exclude all pathological cases?

The intended meaning of the theory is that all parallel paths should be
merges. Or viewed from the commutation perspective: parallel paths must
be permutations achievable using repeated commutation.

Given two parallel paths, can we reconstruct all possible merge
histories that could have led to these two paths? That would perhaps
give us some sort of "richness condition" to use as the missing axiom.

Looking at two parallel paths p and q, we can write down the names of
the contained patches, in sequence. For a patch name n occurring in both
p and q the two paths assign a position to this name: pos(p,n) and
pos(q,n). Let i..j denote the range of integers between i and j,
inclusive, regardless of which of them is greater than the other.

We get a valid patch theory only if (at least) the following holds: for
every n in name(p)=name(q), and for each i in pos(p,n)..pos(q,n), there
exists a parallel path r, such that i=pos(r,n).

Going back to your first "pathological" counter example of a patch graph
with (only) two parallel paths with name sequences acb and bca, the
above axiom makes it clear that this cannot be the full graph. There
have to be additional parallel paths r, s, such that
pos(r,a)=2=pos(s,b). Here, r could be bac or cab and s could be abc or
cba. Minimal extensions are r=bac,s=abc, where pos(c) is in {2,3}, or
r=cab,s=cba, where pos(c) is in {1,2}. However, these minimal extensions
are still "pathological" in the sense that c depends on either a or b,
but commutes with both individually. So the axiom above is necessary but
not sufficient.

Another aspect not covered is that for permutivity to hold, we need not
only ensure a /minimum/ number of parallel paths, but also a /maximum/
number of intermediate contexts. The extreme case is the commuting
hypercube of dimension n with n! different paths of length n (all
permutations of n), sharing 2^n contexts (the corners of the hypercube).

The more I think about this, the more I appreciate that in your approach
with invertible names and patches all these properties can be derived
from the simple "name balancing" axiom.

But is that really the case? Can we derive all the required properties
(i.e. exclude all pathological cases) for the subset of paths consisting
only of positive patches? If this is indeed the case, then it would be
quite a feat and perhaps worth the bit of extra complexity in the
definition.

Cheers
Ben

Am 05.12.20 um 18:41 schrieb Ben Franksen:
> Am 05.12.20 um 00:09 schrieb James Cook:
>> On Thu, Nov 26, 2020 at 02:12:56PM +0100, Ben Franksen wrote:
>>> I would start the whole thing with a slightly different definition.
>>
>> This seems like a nice way to do things. I like the reuse of terms from
>> category theory.
>>
>> Do you happen to know how much this has in common with "A Categorical
>> Theory of Patches" which pijul is based on? I have not read the paper
>> in detail, but I know they too talk about pushouts.
>> https://arxiv.org/abs/1311.3903
> 
> I don't think it has a lot to do with it, apart from the idea to define
> clean merges as pushouts (which is indeed quite an obvious idea, once
> you realize that patch sequences form a category). The paper asks (and
> answers) the question "what would be the minimal extension of the
> category of file contents (lists of lines/tokens) and atomic changes
> (insert a line at given pos, remove a line at given pos), such that all
> pushouts exist in it?" This is about (a certain class of) primitive
> patches, whereas our discussion is about abstract patch theory.
> 
>>> We extend the three functions above to paths. The source of a path is
>>> the source of its first patch, the target is the target of its last
>>> patch, and the name of a path is the set of names of its patches:
>>>   name(p1,...,pn)={name(p1),...,name(pn)}
>>>
>>> This is indeed an extension if we identfy a name with the singleton set
>>> consisting of just that name.
>>
>> nit: The empty patch has no patches, so these definitions of source and
>> target don't work for it.
> 
> Right, that was sloppy. Indeed this should serve to remind us that there
> are lots of /different/ "empty sequences" (namely one for each context).
> 
>>> We can define the clean merge of two paths as follows. Let p and q be
>>> paths with the same source. A clean merge of (p,q) is a then a pushout,
>>> in other words, another pair of paths (q',p'), such that pq' and qp' are
>>> parallel, and such that q',p' are minimal in the sense that for any
>>> other such pair (q'', p'') there is a unique path r such that q'',p''
>>> factors through r; that is, p''=p'r and q''=q'r. It follows from the
>>> defining property of a patch theory that name(pq')=name(qp'), but I
>>> think it should also be possible to deduce
>>>
>>>   name(pq') = name(p) U name(q)
>>>
>>> using the universal property (minimality) of the merge. Indeed I think
>>> both definitions are equivalent. Note that any such merge must
>>> necessarily be unique (because pushouts are unique).
>>
>> I think the following is a counterexample. Suppose this is the whole
>> patch theory. All edges go downward. a and a' have the same name and so
>> do b and b' and c and c'.
>>
>>    *
>>  a/ \b
>>  /   \
>> *     *
>> |c    |c'
>> |     |
>> *     *
>>  \   /
>> b'\ /a'
>>    *
>>
>> In this case, the pushout of a and b is the whole diagram, so the merge
>> is acb' or bca', both of which include c's name.
> 
> You are right. Such a scenario must not be allowed and the "global"
> definition of patch graph I gave is too weak to exclude it. If you
> analyse the counter example from the POV of permutations of names, what
> we have here is (indeed the simplest example of) a permutation group
> that is not generated by adjacent transpositions. If we build a patch
> theory bottom-up starting with commutation of single patches, then this
> cannot happen: the set of parallel paths from s to t always maps to a
> permutation group that is generated by adjacent transpositions.
> 
> So one way to "fix" the theory could be to add that as a requirement.
> Formally, consider the map from paths to sequences of names. For each
> pair (s,t) of contexts, the image of the set of parallel paths from s to
> t under this map can be regarded as a subset of S_N (where N is the
> common path length). The requirement is that this is a subgroup
> generated by adjacent transpoisitions.)
> 
> At least I *think* that would fix the theory, i.e. allow to prove my
> conjectures about merges. However, it feels a bit heavy-weight and I
> would prefer a more elegant way to express this requirement.
> 
>> Maybe an axiom could be added that name(pq') = name(p) U name(q)
>> whenever pq is a clean merge of p and q. I suspect it is strictly
>> weaker than my Assumption 4.1 but it is more succinct and maybe it is
>> enough.
>>
>> (The example violates Assumption 4.1: conflict(a, b) must be true
>> because a and b can't be merged directly, but then the paths acb' and
>> bca' contain conflicting patches, which is not allowed. Maybe this can
>> be turned into a proof.)
>>
>> Here's an example showing the name union axiom doesn't imply 4.1:
>>
>>       *
>>     a/ \b
>>     /   \
>>    *     *
>>  d/|c  c'|\d'
>>  / |     | \
>> *  *     *  *
>> |   \   /   |
>> |  b'\ /a'  |
>> |     *     |
>>  \         /
>>   \       /
>> b''\     /a''
>>     \   /
>>      \ /
>>       *
>>
>> Now there is no pushout of a and b, so the name union axiom has nothing
>> to say, but 4.1 is still violated.
> 
> Yes, this doesn't surprise me. The above would of course be excluded by
> the (somewhat crude) fix I proposed above.
> 
> Cheers
> Ben
>