From falsifian at falsifian.org  Fri Jan 22 22:42:14 2021
From: falsifian at falsifian.org (James Cook)
Date: Fri, 22 Jan 2021 22:42:14 +0000
Subject: [darcs-users] Write-up on "tree repositories" as an alternative
 to conflictors
In-Reply-To: <rrlfgu$pbb$1@ciao.gmane.io>
References: <rpo9ko$159o$1@ciao.gmane.io>
 <20201204230902.yybdxci6ghe7pty7@angel.falsifian.org>
 <rqggp1$117s$1@ciao.gmane.io> <rr27lp$l5b$1@ciao.gmane.io>
 <rr2gon$ng6$1@ciao.gmane.io>
 <20201212172232.si6u7kdzaj7y6p7t@angel.falsifian.org>
 <rr4pm1$fc2$1@ciao.gmane.io> <rr7suq$fp1$1@ciao.gmane.io>
 <20201219040040.i2jo6srrk5lz46qi@angel.falsifian.org>
 <rrlfgu$pbb$1@ciao.gmane.io>
Message-ID: <20210122224214.47uu62hh5nfo4zh6@angel.falsifian.org>

Hi Ben,

Happy 2021! I let this thread drop for a while but trying to catch up now.

I think we had defined an abstract patch theory as a set of names, and
a set of possible "contexts" which are sets of names (or corners on the
hypercube), where the set of possible contexts satisfies (copying and
pasting from my earlier email):

(a) The empty set is a context.

(b) If c1 and c2 are contexts, then c1 ? c2 is a context.

(c) If c1, c2, c3 are contexts and c1 ? c3 and c2 ? c3, then c1 ? c2 is
    a context.

(d) For every non-empty context c there is at least one element n of c
    such that c\{n} is also a context.

This can be turned into a category: take your definition of an abstract
patch as a pair (c,n) (c is a context and n is a name) and then take
the free category over that.

Then the discussion turned to concrete representations of patches.
Long-winded (sorry) question below...

> > I'm not sure I understand. I guess a state is the content of the files
> > in the repository.
> 
> Plus the directory structure. One concrete representation of states is
> the 'Tree' type in Darcs, see
> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Util/Tree.hs#59
> up to line 84.

Yes, that makes sense.

> > Any concrete patch has exactly one start state and
> > one end state.
> 
> Not at all. Unnamed prim patches can be applied to many states, indeed
> the domain is always infinite. For instance the domain of "addfile ./f"
> is the set of all trees which do not have "./f" as either file or
> directory. Its codomain is the set of all trees that do have "./f" as a
> file. Its inverse is "rmfile ./f" with domain and codomain swapped.
> 
> > Do you mean that it's a partial bijection where the
> > domain and range are both sets of size 1?
> 
> No, see above.
> 
> > E.g. suppose the concrete patch P is:
> > 
> > hunk ./file.txt 2
> > +world
> > 
> > and in the context of p, file.txt has just one line "Hello".
> > I guess it's tempting to think of p as the following partial bijection
> > f: f always inserts the line "world" after the first line of file.txt.
> 
> This is exactly how the above hunk is interpreted in Darcs.
> 
> > But that's not really a useful representation of p; e.g. it doesn't
> > work if p gets commuted to a different context.
> 
> Why not? If you commute the patch to a different context then its domain
> and codomain usually change, too. I would expect nothing less.
> 
> But perhaps what you mean is that it is not a useful /generic/
> description of "what the patch does" /independent/ of commutation. This
> is true. If we had a representation of concrete patches that is truly
> context independent, then a lot of problems could be solved quite
> easily, see Pijul. In particular, I think we would not need names for
> patches.
> 
> > So to me it makes more
> > sense to just say that concretely, p is the pair ("Hello\n",
> > "Hello\nworld\n").
> 
> It is certainly possible to define hunks in this way. In Darcs, so
> called binary patches are defined in this way: they contain the old
> content and the new content.
> 
> However, even if you do that, your hunks are still "polymorphic"
> relative to the "rest of the tree". And commuting such a hunk with a
> file rename patch may still have to modify the hunk.
> 
> > I guess this is a long-winded way to just say I'm not sure what you
> > mean by concrete patches being bijections.
> 
> As explained above, for each kind of prim patch in Darcs (hunk, binary,
> addfile, etc... but please ignore setpref patches) one can precisely
> define the set of trees that are their domain and codomain and they are
> designed so that the partial function defined by applying the patch is
> injective and thus can be inverted.
> 
> > I guess you'd want to be able to commute concrete patches without
> > access to any information other than those two patches'
> > representations. Does your definition capture that?
> 
> Absolutely. The full commute code for Darcs' prim patches is here:
> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Commute.hs.
> It could use a bit of streamlining but I think the overall picture of
> how commutation works on prim patches is recognizable.

I think I am closer to understanding what you mean by "concrete patch",
but I think I haven't got it completely.

Here's my understanding so far:

a) We begin with a set S of states. In the case of Darcs, that's the
   Tree type.

b) A concrete patch is a bijection between two subsets of S. A
   "concrete patch theory" gives a way of encoding some such bijections
   as data. For example, every Darcs prim patch encodes such a
   bijection, so Darcs prim patches (excluding names) are a "concrete
   patch theory" in this sense.

c) If two Darcs prim patches are different (excluding names) then they
   encode different bijections. This is probably not too hard to prove.
   If it weren't true, Darcs wouldn't satisfy point (b) above.

d) Commuting two concrete patches (assuming it succeeds) may or may not
   change the patches. I suppose the commute function is an essential
   part of the description of a concrete patch theory.

e) We can turn any concrete patch theory into a category by taking all
   subsets of S as objects and then taking the free category generated
   by all the bijections in our concrete patch theory.

f) We're interested in functors from a concrete patch theory to an
   abstract patch theory as described at the top of this email (e.g.
   objects are sets of names).

But that can't all be right. In particular I'm confused about (f).
Suppose we have just two names n1 and n2, and our contexts are {}, {n1}
and {n1, n2}. When I think of making that concrete, I imagine something
like this:

* {} is an empty Tree.

* n1 creates a file called "file.txt" with one line, "Hello". So,
  concretely, {n1} is a Tree with just "file.txt" with just "Hello".

* n2 appends "world" after "Hello", so {n1,n2} is a Tree with just
  "file.txt" containing two lines "Hello and world".

But a functor from a concrete patch theory (as defined above) to that
abstract patch theory couldn't work like that. The concrete patch
theory objects are supposed to be sets of states, but above I assigned
a single state to each context.

What went wrong? Should I have assigned a set of states to each of {},
{n1} and {n1,n2}, and if so, can you give an example assignment?

Another thought: Could there be other concrete patch theories that
violate (b) but are still interesting? E.g. maybe there could be two
patches A and B that represent the same bijection, but that behave
differently when commuted with a third patch C.

I've probably misunderstood something basic; maybe you can figure out
where I stumbled.

> > I'm not sure what rules constrain us here. E.g. I could just declare
> > that any "conflicted" context maps to the empty fileset, but would that
> > work out? I guess the hard part is figuring out what how the concrete
> > patches work (for example, how you can commute them without access to
> > anything but the two patch representations).
> 
> Indeed. This is the reason why the conflictor commute and merge code is
> highly non-trivial. A complicating factor is that a clean definition of
> conflictors cannot be based on prim patches alone, they have to be
> /named/ prim patches. How to resolve this tension is still largely unclear.
> 
> I think the constraints are the usual ones: the merge and commute laws,
> in particular permutivity. E.g. the first thing you need to ensure is
> that if you merge p\/q to q'/\p', then p;q' commutes to q;p', even if p'
> and q' are conflicted.

It would be interesting to discuss this further but I think I should
figure out what you mean by concrete patch theory first.

-- 
James

From ben.franksen at online.de  Sun Jan 24 11:03:05 2021
From: ben.franksen at online.de (Ben Franksen)
Date: Sun, 24 Jan 2021 12:03:05 +0100
Subject: [darcs-users] Write-up on "tree repositories" as an alternative
 to conflictors
In-Reply-To: <20210122224214.47uu62hh5nfo4zh6@angel.falsifian.org>
References: <rpo9ko$159o$1@ciao.gmane.io>
 <20201204230902.yybdxci6ghe7pty7@angel.falsifian.org>
 <rqggp1$117s$1@ciao.gmane.io> <rr27lp$l5b$1@ciao.gmane.io>
 <rr2gon$ng6$1@ciao.gmane.io>
 <20201212172232.si6u7kdzaj7y6p7t@angel.falsifian.org>
 <rr4pm1$fc2$1@ciao.gmane.io> <rr7suq$fp1$1@ciao.gmane.io>
 <20201219040040.i2jo6srrk5lz46qi@angel.falsifian.org>
 <rrlfgu$pbb$1@ciao.gmane.io>
 <20210122224214.47uu62hh5nfo4zh6@angel.falsifian.org>
Message-ID: <rujk59$d1d$1@ciao.gmane.io>

Am 22.01.21 um 23:42 schrieb James Cook:
> I think I am closer to understanding what you mean by "concrete patch",
> but I think I haven't got it completely.
> 
> Here's my understanding so far:
> 
> a) We begin with a set S of states. In the case of Darcs, that's the
>    Tree type.
> 
> b) A concrete patch is a bijection between two subsets of S. A
>    "concrete patch theory" gives a way of encoding some such bijections
>    as data. For example, every Darcs prim patch encodes such a
>    bijection, so Darcs prim patches (excluding names) are a "concrete
>    patch theory" in this sense.

Don't forget that only /sequences/ of patches form a category.

> c) If two Darcs prim patches are different (excluding names) then they
>    encode different bijections. This is probably not too hard to prove.
>    If it weren't true, Darcs wouldn't satisfy point (b) above.
> 
> d) Commuting two concrete patches (assuming it succeeds) may or may not
>    change the patches. I suppose the commute function is an essential
>    part of the description of a concrete patch theory.
> 
> e) We can turn any concrete patch theory into a category by taking all
>    subsets of S as objects and then taking the free category generated
>    by all the bijections in our concrete patch theory.

Up to here: yes.

> f) We're interested in functors from a concrete patch theory to an
>    abstract patch theory as described at the top of this email (e.g.
>    objects are sets of names).

I think it should be the other way around: a functor from the abstract
patch category to the concrete one.

> But that can't all be right. In particular I'm confused about (f).
> Suppose we have just two names n1 and n2, and our contexts are {}, {n1}
> and {n1, n2}. When I think of making that concrete, I imagine something
> like this:
> 
> * {} is an empty Tree.
> 
> * n1 creates a file called "file.txt" with one line, "Hello". So,
>   concretely, {n1} is a Tree with just "file.txt" with just "Hello".
> 
> * n2 appends "world" after "Hello", so {n1,n2} is a Tree with just
>   "file.txt" containing two lines "Hello and world".
> 
> But a functor from a concrete patch theory (as defined above) to that
> abstract patch theory couldn't work like that. The concrete patch
> theory objects are supposed to be sets of states, but above I assigned
> a single state to each context.

I think your objection here is valid and I find it quite illuminating.

Indeed, it seems that when we move from concrete patches (representing
partial bijections on states) to named patches, we implicitly pick a
single element from the domain and range. This is something I wasn't
really (consciously) aware of. I think this also clears up a lot of
misunderstanding I had with Ganesh over the years, when we discussed the
meaning of contexts.

The concrete patch you associated with n1 obviously has a much larger
domain than just the single empty state/tree (its domain is infinite,
consisting of every state in which no file "file.txt" exists), when we
tag this patch with the name n1 and posit that {n1} is a context, then
we (implicitly) assume that someone started with an empty tree and then
recorded n1 in that particular state. Thus our choice of the empty tree
as the common starting point for all repositories implies and determines
the choice of a single associated state for each context.

I think this means that the functor from abstract to concrete patches
must fulfill another requirement and I think this requirement is
equivalent to what the Camp paper and JJ's inverse semigroup paper
define as "sensible" patch sequences. Informally, we have to make sure
that all abstract paths that start from the empty context are mapped to
a bijection that is defined on our distinguished start state.

Here is a refined definition:

A /realization functor/ R is a functor from abstract patches (sequences
of names and contexts) to concrete patches (with a commute function),
such that

 (a) R maps abstract paths to concrete paths of the same length
 (b) for any two parallel abstract paths ns and ms of length 2,
     commute(R(ns))=Just(R(ms)) (here R is the mapping on arrows)
 (c) there is a distinguished state E, such that E is an element of
     R({}) (here R is the mapping on objects i.e. contexts)

I believe this is enough to ensure "sensibility", which means that we
can interpret contexts as single states, relative to some (global)
choice of E.

> Another thought: Could there be other concrete patch theories that
> violate (b) but are still interesting? E.g. maybe there could be two
> patches A and B that represent the same bijection, but that behave
> differently when commuted with a third patch C.

This does not contradict (b) which merely says that the mapping between
patches and bijections is a function. It does not need to be injective,
in fact in Darcs it is not: there are many different sequences of
unnamed prim patches that represent the same partial bijection. An
obvious example is commutation, but there are many others. For instance,
we have functions to coalesce adjacent hunks. You could also split a
multiline hunk into separate one-line hunks; or split any hunk into a
pure remove plus a pure add; etc. These operations do not respect
commutation properties, they only respect the effect i.e. the mapping to
a partial bijection. (Thus, these operations are allowed only before
attaching a name i.e. before we record them, or when we create new
patches out of existing ones "destructively" like in amend or rebase.)

>>> I'm not sure what rules constrain us here. E.g. I could just declare
>>> that any "conflicted" context maps to the empty fileset, but would that
>>> work out? I guess the hard part is figuring out what how the concrete
>>> patches work (for example, how you can commute them without access to
>>> anything but the two patch representations).
>>
>> Indeed. This is the reason why the conflictor commute and merge code is
>> highly non-trivial. A complicating factor is that a clean definition of
>> conflictors cannot be based on prim patches alone, they have to be
>> /named/ prim patches. How to resolve this tension is still largely unclear.
>>
>> I think the constraints are the usual ones: the merge and commute laws,
>> in particular permutivity. E.g. the first thing you need to ensure is
>> that if you merge p\/q to q'/\p', then p;q' commutes to q;p', even if p'
>> and q' are conflicted.
> 
> It would be interesting to discuss this further but I think I should
> figure out what you mean by concrete patch theory first.

I hope this is clearer now.

BTW, I think that for discussions like this we should use a simpler
concrete patch theory as an example, one that still exhibits the
features we care about. I propose we track the state of one (unnamed)
file, which we regard as a linear sequence of tokens, say characters:

  type State = [Char]

Our primitive concrete patches are:

  data PrimPatch = Add Int Char | Remove Int Char

Such a patch gets interpreted as a bijection via

  (1) apply (Add i c) cs = -- insert c at position i
  (2) apply (Remove i c) cs = -- remove c at position i

where I assume positions are zero based. The domain of (1) is the set of
all strings that have length at least i. The domain of (2) is the set of
strings that have c at position i (thus in particular have length at
least i+1).

Inversion swaps the constructor:

  invert (Add i c) = Remove i c -- and vice versa

and it is clear that apply p^ = (apply p)^.

The commute function must adapt the index, e.g.

commute (Remove i x, Add j y)
  | i+1 < j = Just (Add (j+1) y, Remove i x)
  | i > j+1 = Just (Add j y, Remove (i+1) x)
  | otherwise = Nothing
...

Cheers
Ben


From ben.franksen at online.de  Tue Jan 26 19:49:30 2021
From: ben.franksen at online.de (Ben Franksen)
Date: Tue, 26 Jan 2021 20:49:30 +0100
Subject: [darcs-users] Write-up on "tree repositories" as an alternative
 to conflictors
In-Reply-To: <rujk59$d1d$1@ciao.gmane.io>
References: <rpo9ko$159o$1@ciao.gmane.io>
 <20201204230902.yybdxci6ghe7pty7@angel.falsifian.org>
 <rqggp1$117s$1@ciao.gmane.io> <rr27lp$l5b$1@ciao.gmane.io>
 <rr2gon$ng6$1@ciao.gmane.io>
 <20201212172232.si6u7kdzaj7y6p7t@angel.falsifian.org>
 <rr4pm1$fc2$1@ciao.gmane.io> <rr7suq$fp1$1@ciao.gmane.io>
 <20201219040040.i2jo6srrk5lz46qi@angel.falsifian.org>
 <rrlfgu$pbb$1@ciao.gmane.io>
 <20210122224214.47uu62hh5nfo4zh6@angel.falsifian.org>
 <rujk59$d1d$1@ciao.gmane.io>
Message-ID: <ruproa$gh7$1@ciao.gmane.io>

Am 24.01.21 um 12:03 schrieb Ben Franksen:
> A /realization functor/ R is a functor from abstract patches (sequences
> of names and contexts) to concrete patches (with a commute function),
> such that
> 
>  (a) R maps abstract paths to concrete paths of the same length
>  (b) for any two parallel abstract paths ns and ms of length 2,
>      commute(R(ns))=Just(R(ms)) (here R is the mapping on arrows)
>  (c) there is a distinguished state E, such that E is an element of
>      R({}) (here R is the mapping on objects i.e. contexts)

I just realized (pun, haha) that this functor has a name in Darcs: we
call it 'effect'. I think I like that name better.

Here is my picture of the overall structure of patch theory, when
extended to the lower levels. We have three categories: abstract patches
(AP), concrete patches (CP), and partial bijections (PB), and two functors:

  AP ---effect--> CP ---apply--> PB

(While PB is a groupoid, CP is not, contrary to what I stated in my last
mail, because that would mean pp^:s->s = id_s, which does not play well
with commutation).

The effect functor is understood to be relative to some choice of start
state, see condition (c). Neither functor is injective (on morphisms;
for objects I haven't checked this). I don't think the apply functor is
surjective in any strict sense, but we should reasonably expect that for
any two states s and t there is an arrow p in CP such that apply(p) is
defined at s and apply(p)(s)=t. Indeed, the treeDiff function in Darcs
gives a possible solution (the result depends on which diff algorithm is
selected).

The effect functor is surjective only if we restrict CP to sequences
that are "sensible" relative to the chosen start state. For any sensible
concrete patch sequence cp there is a sequence of names ap such that
effect(ap)=cp, since this is precisely the condition under which we are
able to record such a sequence.

Cheers
Ben


From falsifian at falsifian.org  Sat Jan 30 20:55:49 2021
From: falsifian at falsifian.org (James Cook)
Date: Sat, 30 Jan 2021 20:55:49 +0000
Subject: [darcs-users] Write-up on "tree repositories" as an alternative
 to conflictors
In-Reply-To: <ruproa$gh7$1@ciao.gmane.io>
References: <rr27lp$l5b$1@ciao.gmane.io> <rr2gon$ng6$1@ciao.gmane.io>
 <20201212172232.si6u7kdzaj7y6p7t@angel.falsifian.org>
 <rr4pm1$fc2$1@ciao.gmane.io> <rr7suq$fp1$1@ciao.gmane.io>
 <20201219040040.i2jo6srrk5lz46qi@angel.falsifian.org>
 <rrlfgu$pbb$1@ciao.gmane.io>
 <20210122224214.47uu62hh5nfo4zh6@angel.falsifian.org>
 <rujk59$d1d$1@ciao.gmane.io> <ruproa$gh7$1@ciao.gmane.io>
Message-ID: <20210130205549.dyse6e3e2ya2g4fa@moth.falsifian.org>

On Tue, Jan 26, 2021 at 08:49:30PM +0100, Ben Franksen wrote:
> Am 24.01.21 um 12:03 schrieb Ben Franksen:
> > A /realization functor/ R is a functor from abstract patches (sequences
> > of names and contexts) to concrete patches (with a commute function),
> > such that
> > 
> >  (a) R maps abstract paths to concrete paths of the same length
> >  (b) for any two parallel abstract paths ns and ms of length 2,
> >      commute(R(ns))=Just(R(ms)) (here R is the mapping on arrows)
> >  (c) there is a distinguished state E, such that E is an element of
> >      R({}) (here R is the mapping on objects i.e. contexts)

This seems reasonable. For (b) I guess you mean any two /different/
paths of length 2?

> I just realized (pun, haha) that this functor has a name in Darcs: we
> call it 'effect'. I think I like that name better.
> 
> Here is my picture of the overall structure of patch theory, when
> extended to the lower levels. We have three categories: abstract patches
> (AP), concrete patches (CP), and partial bijections (PB), and two functors:
> 
>   AP ---effect--> CP ---apply--> PB
> 
> (While PB is a groupoid, CP is not, contrary to what I stated in my last
> mail, because that would mean pp^:s->s = id_s, which does not play well
> with commutation).
> 
> The effect functor is understood to be relative to some choice of start
> state, see condition (c). Neither functor is injective (on morphisms;
> for objects I haven't checked this). I don't think the apply functor is
> surjective in any strict sense, but we should reasonably expect that for
> any two states s and t there is an arrow p in CP such that apply(p) is
> defined at s and apply(p)(s)=t. Indeed, the treeDiff function in Darcs
> gives a possible solution (the result depends on which diff algorithm is
> selected).

Thanks, this clears things up a bit for me. I hadn't realized you meant
CP and PB to be different, which might explain my confusion about
"patch theories that violate (b)" from my previous email.

One nitpick: the definition of "sensible" in the inverse semigroup
paper might be different from yours and from the Camp definition.
Definition 1.8 says a patch is sensible if it's effect isn't "0", which
I think just means its domain and range are nonempty.

Let me check my understanding of the picture you've painted:

* A "patch theory" or maybe "implemented patch theory" is a tuple
  (State, Name, E, AP, CP, PB, commute, effect, apply) satisfying all
  the following conditions.

* State and Name are sets, intended to denote the set of possible
  states and names respectively. (E.g. type State = [Char] in your Jan
  24 email.)

* E is an element of State: the "starting state".

* AP we've discussed a fair bit. It is a category. Its objects are
  subsets of Name, its morphisms are labelled with sequences of names,
  and we've previously listed some axioms AP must satisfy.

* PB: the objects are all possible subsets of State, and the morphisms
  between any two objects are all possible bijections between those
  sets. (PB is fully determined by the set State, so it could be left
  out of the tuple defining the patch theory.)

* CP I'm less sure about. I guess it is any category whose objects are
  the same as PB's objects. There are no constraints on its morphisms
  other than what will be implied by our requirements on the effect
  functor.

  As a concrete example, in your toy patch theory, the morphisms in CP
  between any objects S and T are triples (S, T, e) where e is an
  element of [PrimPatch] for which apply(e)(S) = T (applying a function
  to a set in the natural way). I include S and T as part of the
  morphism so that the same e can appear in different morphisms.
  Concatenating two morphisms (S, T, e) and (T, U, f) produces (S, U, e
  ++ f). The identity on S is (S, S, []).

* commute is a partial function on length-two paths in AP. (Maybe
  "commute" could be out of the tuple, and instead be implied by the
  effect functor. I haven't checked that this makes sense.)

* effect : AP -> CP is a functor satisfying the properties (a), (b) (c)
  from your Jan 24 email ("realization functor").

* apply : CP -> PB is any functor.

Introducing conflictors to such a patch theory would mean the
following:

* Extending AP so the set of objects ("contexts") is closed under
  union.

* Adding new morphisms to CP, called "conflictors".

* Extending the effect functor so it's defined on the newly expanded
  AP, probably using the "conflictor" morphisms added in the previous
  step.

* Extending the apply functor so it's defined on the morphisms
  (conflictors) added to CP.

The first is a deterministic change; no decisions are required. The
rest requires careful design.

> The effect functor is surjective only if we restrict CP to sequences
> that are "sensible" relative to the chosen start state. For any sensible
> concrete patch sequence cp there is a sequence of names ap such that
> effect(ap)=cp, since this is precisely the condition under which we are
> able to record such a sequence.

That's an interesting statement. Giving names to patches seems a bit
tricky to reason about. I guess if we pretend that when Darcs makes up
a random name for a patch, it's actually "discovering" a previously
unknown mapping under some predetermined but inaccessible effect
functor, then what you're saying should be true. I hope that makes
sense. I don't know if there's a better way to look at it.

-- 
James