[darcs-users] Write-up on "tree repositories" as an alternative to conflictors

Sun Jan 24 11:03:05 UTC 2021

Am 22.01.21 um 23:42 schrieb James Cook:
> I think I am closer to understanding what you mean by "concrete patch",
> but I think I haven't got it completely.
> 
> Here's my understanding so far:
> 
> a) We begin with a set S of states. In the case of Darcs, that's the
>    Tree type.
> 
> b) A concrete patch is a bijection between two subsets of S. A
>    "concrete patch theory" gives a way of encoding some such bijections
>    as data. For example, every Darcs prim patch encodes such a
>    bijection, so Darcs prim patches (excluding names) are a "concrete
>    patch theory" in this sense.

Don't forget that only /sequences/ of patches form a category.

> c) If two Darcs prim patches are different (excluding names) then they
>    encode different bijections. This is probably not too hard to prove.
>    If it weren't true, Darcs wouldn't satisfy point (b) above.
> 
> d) Commuting two concrete patches (assuming it succeeds) may or may not
>    change the patches. I suppose the commute function is an essential
>    part of the description of a concrete patch theory.
> 
> e) We can turn any concrete patch theory into a category by taking all
>    subsets of S as objects and then taking the free category generated
>    by all the bijections in our concrete patch theory.

Up to here: yes.

> f) We're interested in functors from a concrete patch theory to an
>    abstract patch theory as described at the top of this email (e.g.
>    objects are sets of names).

I think it should be the other way around: a functor from the abstract
patch category to the concrete one.

> But that can't all be right. In particular I'm confused about (f).
> Suppose we have just two names n1 and n2, and our contexts are {}, {n1}
> and {n1, n2}. When I think of making that concrete, I imagine something
> like this:
> 
> * {} is an empty Tree.
> 
> * n1 creates a file called "file.txt" with one line, "Hello". So,
>   concretely, {n1} is a Tree with just "file.txt" with just "Hello".
> 
> * n2 appends "world" after "Hello", so {n1,n2} is a Tree with just
>   "file.txt" containing two lines "Hello and world".
> 
> But a functor from a concrete patch theory (as defined above) to that
> abstract patch theory couldn't work like that. The concrete patch
> theory objects are supposed to be sets of states, but above I assigned
> a single state to each context.

I think your objection here is valid and I find it quite illuminating.

Indeed, it seems that when we move from concrete patches (representing
partial bijections on states) to named patches, we implicitly pick a
single element from the domain and range. This is something I wasn't
really (consciously) aware of. I think this also clears up a lot of
misunderstanding I had with Ganesh over the years, when we discussed the
meaning of contexts.

The concrete patch you associated with n1 obviously has a much larger
domain than just the single empty state/tree (its domain is infinite,
consisting of every state in which no file "file.txt" exists), when we
tag this patch with the name n1 and posit that {n1} is a context, then
we (implicitly) assume that someone started with an empty tree and then
recorded n1 in that particular state. Thus our choice of the empty tree
as the common starting point for all repositories implies and determines
the choice of a single associated state for each context.

I think this means that the functor from abstract to concrete patches
must fulfill another requirement and I think this requirement is
equivalent to what the Camp paper and JJ's inverse semigroup paper
define as "sensible" patch sequences. Informally, we have to make sure
that all abstract paths that start from the empty context are mapped to
a bijection that is defined on our distinguished start state.

Here is a refined definition:

A /realization functor/ R is a functor from abstract patches (sequences
of names and contexts) to concrete patches (with a commute function),
such that

 (a) R maps abstract paths to concrete paths of the same length
 (b) for any two parallel abstract paths ns and ms of length 2,
     commute(R(ns))=Just(R(ms)) (here R is the mapping on arrows)
 (c) there is a distinguished state E, such that E is an element of
     R({}) (here R is the mapping on objects i.e. contexts)

I believe this is enough to ensure "sensibility", which means that we
can interpret contexts as single states, relative to some (global)
choice of E.

> Another thought: Could there be other concrete patch theories that
> violate (b) but are still interesting? E.g. maybe there could be two
> patches A and B that represent the same bijection, but that behave
> differently when commuted with a third patch C.

This does not contradict (b) which merely says that the mapping between
patches and bijections is a function. It does not need to be injective,
in fact in Darcs it is not: there are many different sequences of
unnamed prim patches that represent the same partial bijection. An
obvious example is commutation, but there are many others. For instance,
we have functions to coalesce adjacent hunks. You could also split a
multiline hunk into separate one-line hunks; or split any hunk into a
pure remove plus a pure add; etc. These operations do not respect
commutation properties, they only respect the effect i.e. the mapping to
a partial bijection. (Thus, these operations are allowed only before
attaching a name i.e. before we record them, or when we create new
patches out of existing ones "destructively" like in amend or rebase.)

>>> I'm not sure what rules constrain us here. E.g. I could just declare
>>> that any "conflicted" context maps to the empty fileset, but would that
>>> work out? I guess the hard part is figuring out what how the concrete
>>> patches work (for example, how you can commute them without access to
>>> anything but the two patch representations).
>>
>> Indeed. This is the reason why the conflictor commute and merge code is
>> highly non-trivial. A complicating factor is that a clean definition of
>> conflictors cannot be based on prim patches alone, they have to be
>> /named/ prim patches. How to resolve this tension is still largely unclear.
>>
>> I think the constraints are the usual ones: the merge and commute laws,
>> in particular permutivity. E.g. the first thing you need to ensure is
>> that if you merge p\/q to q'/\p', then p;q' commutes to q;p', even if p'
>> and q' are conflicted.
> 
> It would be interesting to discuss this further but I think I should
> figure out what you mean by concrete patch theory first.

I hope this is clearer now.

BTW, I think that for discussions like this we should use a simpler
concrete patch theory as an example, one that still exhibits the
features we care about. I propose we track the state of one (unnamed)
file, which we regard as a linear sequence of tokens, say characters:

  type State = [Char]

Our primitive concrete patches are:

  data PrimPatch = Add Int Char | Remove Int Char

Such a patch gets interpreted as a bijection via

  (1) apply (Add i c) cs = -- insert c at position i
  (2) apply (Remove i c) cs = -- remove c at position i

where I assume positions are zero based. The domain of (1) is the set of
all strings that have length at least i. The domain of (2) is the set of
strings that have c at position i (thus in particular have length at
least i+1).

Inversion swaps the constructor:

  invert (Add i c) = Remove i c -- and vice versa

and it is clear that apply p^ = (apply p)^.

The commute function must adapt the index, e.g.

commute (Remove i x, Add j y)
  | i+1 < j = Just (Add (j+1) y, Remove i x)
  | i > j+1 = Just (Add j y, Remove (i+1) x)
  | otherwise = Nothing
...

Cheers
Ben