[darcs-users] Write-up on "tree repositories" as an alternative to conflictors

Sat Dec 19 18:07:26 UTC 2020

Am 19.12.20 um 05:00 schrieb James Cook:
>> Interesting! So corners (potential contexts) have an inherent partial
>> order, indeed they form a join-semilattice. You postulate that if two
>> contexts have an uppper bound that is also a context, then the minimal
>> upper bound must also be a context. I think the math term would be that
>> contexts form a sub-semilattice of the corners.
> 
> I should credit my friend Sridhar Ramesh here. I was asking them
> whether there's a name for some property involving pushouts, and
> casually mentioned we might not need to distinguish morphisms with the
> same domain and codomain, and Sridhar pointed out that's just a
> pre-order.

Absolutely right. Why did I not see this?

It just occurred to me that we can also interpret this from a logical /
type theory POV, think Curry-Howard. In this view, a path p:s->t in the
patch graph is a proof of the proposition s->t, which expresses that "if
s is a context, then so is t", or more concretely "s is a subset of t".
See the lemma below.

> Good points. So now with some rephrasing, the rules on the set of
> contexts are:
> 
> (a) The empty set is a context.
> 
> (b) If c1 and c2 are contexts, then c1 ∩ c2 is a context.
> 
> (c) If c1, c2, c3 are contexts and c1 ⊆ c3 and c2 ⊆ c3, then c1 ∪ c2 is
>     a context.
> 
> (d) For every non-empty context c there is at least one element n of c
>     such that c\{n} is also a context.

Yes, exactly. We then define an /abstract patch/ as a pair (c,n), where
c is a context and n∈c is a name; and write (c,n) : c\{n} -> c. Contexts
and patches form a connected acyclic directed graph, the /patch graph/
over the set of contexts C, and the finite paths of such a graph are the
arrows of its free category.

We can then prove e.g.

Lemma: There is a path between two contexts s and t if and only if s
  is a subset of t. The name of any such path is t\s.

  (Proof is by induction. I can post the details if you are interested.)

with corollaries such as:

Corollary: For each context c there is a path p:0->c.

Corollary: For any path p, |p|=|name(p)|. In other words, the names
  ocurring in a path are unique.

Corollary: For any two contexts s, t, all paths p:s->t have the same
  name and length.

>> Another interesting observation is that we can regard a name as a
>> partial bijection between contexts. The action of the function is to add
>> the name to the context and its inverse is to remove it. Its domain and
>> range are given by
>>
>>   domain(n)={c∈C|n∉c and c∪{n}∈C}, range(n)={c∈C|n∈c and c\{n}∈C}
>>
>> The reason I find this interesting is that it hints at a possibility to
>> re-introduce /concrete/ patches (that is, /patch content/) into the
>> picture. Concrete (primitive) patches can be seen as partial bijections
>> on a set S of states, represented as data.
> 
> I'm not sure I understand. I guess a state is the content of the files
> in the repository.

Plus the directory structure. One concrete representation of states is
the 'Tree' type in Darcs, see
https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Util/Tree.hs#59
up to line 84.

> Any concrete patch has exactly one start state and
> one end state.

Not at all. Unnamed prim patches can be applied to many states, indeed
the domain is always infinite. For instance the domain of "addfile ./f"
is the set of all trees which do not have "./f" as either file or
directory. Its codomain is the set of all trees that do have "./f" as a
file. Its inverse is "rmfile ./f" with domain and codomain swapped.

> Do you mean that it's a partial bijection where the
> domain and range are both sets of size 1?

No, see above.

> E.g. suppose the concrete patch P is:
> 
> hunk ./file.txt 2
> +world
> 
> and in the context of p, file.txt has just one line "Hello".
> I guess it's tempting to think of p as the following partial bijection
> f: f always inserts the line "world" after the first line of file.txt.

This is exactly how the above hunk is interpreted in Darcs.

> But that's not really a useful representation of p; e.g. it doesn't
> work if p gets commuted to a different context.

Why not? If you commute the patch to a different context then its domain
and codomain usually change, too. I would expect nothing less.

But perhaps what you mean is that it is not a useful /generic/
description of "what the patch does" /independent/ of commutation. This
is true. If we had a representation of concrete patches that is truly
context independent, then a lot of problems could be solved quite
easily, see Pijul. In particular, I think we would not need names for
patches.

> So to me it makes more
> sense to just say that concretely, p is the pair ("Hello\n",
> "Hello\nworld\n").

It is certainly possible to define hunks in this way. In Darcs, so
called binary patches are defined in this way: they contain the old
content and the new content.

However, even if you do that, your hunks are still "polymorphic"
relative to the "rest of the tree". And commuting such a hunk with a
file rename patch may still have to modify the hunk.

> I guess this is a long-winded way to just say I'm not sure what you
> mean by concrete patches being bijections.

As explained above, for each kind of prim patch in Darcs (hunk, binary,
addfile, etc... but please ignore setpref patches) one can precisely
define the set of trees that are their domain and codomain and they are
designed so that the partial function defined by applying the patch is
injective and thus can be inverted.

> I guess you'd want to be able to commute concrete patches without
> access to any information other than those two patches'
> representations. Does your definition capture that?

Absolutely. The full commute code for Darcs' prim patches is here:
https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Commute.hs.
It could use a bit of streamlining but I think the overall picture of
how commutation works on prim patches is recognizable.

>> Here, too, we can generalize
>> to finite sequences of patches, which then naturally form a category,
>> and there is a (non-injective) functor (the "apply" functor) from that
>> category to the groupoid of partial bijections on states that maps
>> concatenation to function composition.
>>
>> We may want to study functors from an abstract patch theory to a
>> concrete one.
> 
> That's neat.

We'll see. I am trying to make this more concrete but it's difficult.

>> A functor maps both objects and arrows. For objects,
>> F:C->S associates a concrete state to each context (this mapping is also
>> not necessarily injective). For arrows it associates to each path p:s->t
>> a concrete patch F(p):F(s)->F(t). The functor laws say that
>> F(id_c)=id_F(c) and that F(p;q)=F(p);F(q).
> 
> So the patch is an arrow joining two states? This is more in line with
> my view above that a patch is a pair of states.

Right. I was confused when I wrote that. For concrete patches (or
sequences thereof) source and target are sets of states, not states.

>> This does not a priori mean that each abstract /patch/ (singleton path)
>> maps to a concrete patch,
> 
> Doesn't it, if a concrete patch is just a pair of states? But I guess I
> don't really know what you mean by concrete patch.

I mean a concrete representation as data of a partial bijection on
states. Like the Darcs V1 prim patches.

>> so I guess we have to require that, in
>> addition to the usual functor laws. There may be a few more extra
>> requirements that we want to impose. For instance, I guess we need some
>> sort of consistency requirement that relates commutation in the abstract
>> setting to that in the concrete setting. Needless to say, I haven't
>> thought these things through to the end, so I can't tell yet if it has
>> any practical relevance.
> 
>> This variant of abstract patch theory has clean merge and unmerge (of
>> contexts) already built-in, so to speak, since merging is just taking
>> the union and unmerging is taking the intersection. I find it
>> interesting to note that adding conflictors a la Darcs means that we
>> enlarge the set of contexts by adding /conflicted states/, which are
>> those corners that we get (in addition to the normal unconflicted
>> contexts) if we allow arbitrary unions of contexts. Seen in this way,
>> conflictors appear quite natural and simple, conceptually at least.
> 
> (Replying to this last since it depends on more context.)
> 
> To make this point of view work, I guess you would want to extend the
> functor so that the conflicted contexts also map to concrete states ---

Yes, modulo "state -> sets of states".

> this determines what state Darcs leaves the repo in when there are
> conflicting patches (if you leave out the conflict markers).

Yes.

> And
> similarly you'd want to extend concrete representations of patches.

Yes. That's the idea.

> I'm not sure what rules constrain us here. E.g. I could just declare
> that any "conflicted" context maps to the empty fileset, but would that
> work out? I guess the hard part is figuring out what how the concrete
> patches work (for example, how you can commute them without access to
> anything but the two patch representations).

Indeed. This is the reason why the conflictor commute and merge code is
highly non-trivial. A complicating factor is that a clean definition of
conflictors cannot be based on prim patches alone, they have to be
/named/ prim patches. How to resolve this tension is still largely unclear.

I think the constraints are the usual ones: the merge and commute laws,
in particular permutivity. E.g. the first thing you need to ensure is
that if you merge p\/q to q'/\p', then p;q' commutes to q;p', even if p'
and q' are conflicted.

> This reminds me of the function I built that mapped every
> pseudo-context to a primitive context, when I was working on making an
> arbitrary primitive patch theory commute. If the above could be
> extended to the entire hypercube (rather than just arbitrary unions of
> contexts) then I guess we would have arrived at my original goal of
> making a patch theory where patches always commute. (At this point I'm
> not sure how useful that is.)

Yes (to all you said in that paragraph).

Cheers
Ben