[darcs-users] How to extend a patch theory to fully commute

Thu Aug 13 20:53:58 UTC 2020

Am 13.08.20 um 19:54 schrieb James Cook:
> Question: is there a good place to read about how "darcs pull" works?

I am not aware of anything that you could read. What happens is
basically this:

We read the _darcs/hashed_inventory of both the local and remote repo.
These give us the latest know clean tag and the patches after that and
possibly (the hash of) a parent inventory, for both repos. The patch
list contains the name and hash of a patch; I use "name" here as a
short-hand for the full PatchInfo, including the hidden random number
that serves to make names unique; the hash here refers to the
representation of the patch i.e. the patch itself can be accessed as
_darcs/patches/<hash> or _darcs/inventories/<hash> for inventories. As I
explained in an earlier message, we now try to find the latest common
tag (by name, in the above sense). When we found that we have a point
from which two sequences of patches fork off. We are now commuting every
patch common to both sequences to before the forking point, ending up
with two sequences starting at the same state with no patch in common.
These two steps are done by taggedIntersection and findCommonAndUncommon
in src/Darcs/Patch/Depends.

These are then merged as described in the camp paper, I think in chapter
8; or you could read the code of merge2FL in src/Darcs/Patch/Ident.hs
and the instance Merge (FL p) in src/Darcs/Patch/Merge.hs. Once we
calculated the merge, we can concatenate one half of it with our repo
and all that remains to do is applying the new patches to our pristine
state (and marking conflicts in the working tree). A detailed
description of the algorithm and most of the details I glossed over here
can be found in src/Darcs/Repository/Merge.hs.

>>> I was concerned that adding just that one new kind of equivalence
>>> might not be enough.
>>
>> I think you first need a more precise definition of what inversion means
>> for primitive patches. For instance, if inverse patches come into play,
>> your definition of context addresses runs into problems because now the
>> same patch can occur multiple times in a sequence, so you can no longer
>> uniquely refer to the patches in a sequence by their names. For
>> instance, if you split A;A^;A;B after the A^, your patch name sets will
>> be X={A,A^} and Y={A,B}, where A now occurs in both X and Y.
> 
> I wonder whether I could forbid a context address from containing more
> than one patch with the same name. I will think about whether that
> would cause problems with the rest of the theory.

This might be possible.

>> A good way to see if your definition holds up is to see whether you can
>> prove your
>>
>> Conjecture 1: Every context address points to at most one context.
> 
> Sure, that's a nice goal.
> 
> I think I will start a latex file, try to define what a patch theory
> is and what context address is for a given patch theory, then try to
> prove Conjecture 1.
> 
> I know of two places with formal definitions of "patch theory": J.
> Jacobson's Inverse Semigroups paper and the Camp theory document. I'll
> probably start a new definition with just the properties I find I
> need. (A patch theory is a (possibly infinite) directed graph, with a
> "commute" relation and a "name" function, satisfying the following
> axioms: ...)

I think the approach you took so far is better: find out bit by bit what
you need, instead of trying to come up with a full formal definition up
front.

> Also, to make sure I don't lose track: last month you described a way
> to model patch sequences with inverse categories:
> https://lists.osuosl.org/pipermail/darcs-users/2020-July/027451.html .
> If I understood that right, that doesn't include a definition of what
> a patch theory is

I think there is not one patch theory but many. The theory for primitive
patches is certainly different from that for "extended" patches as you
call them. For instance, in Darcs extended patches are not invertible,
while prim patches certainly are. OTOH extended patches have a total
merge operation which prim patches don't have, in general.

I think the most important law that should be valid for all patch
theories is permutivity. Given inverses, commute and (partial i.e.
clean) merge for prim patches can be defined in terms of each other. My
point of view is that patches /represent/ partial 1:1 functions on the
set of possible states (source trees). Equivalent patch sequences (i.e.
sequences that differ only in the order of patches) must represent the
same function on states.

Patch names complicate that picture a bit: they add structure that is
independent of the original "meaning" of a patch as a partial 1:1 function.

> Here's another direction I want to think about: I suspect that when it
> comes time to implement this, all you really need are two things:
> primitive patches, and "tombstones" representing patches that have
> been deleted (if you like, representing pairs AA^ that have cancelled
> each other out). In particular, you don't need any kind of
> "conflictor" or "extended patch" or anything like that.
>
> To be a bit more precise: suppose we define a "repository" to be an
> extended patch sequence as defined in my HMS Patchfinder theory,
> starting at the primitive context O. Resolving conflicts means
> adjusting the repository so that the ending context is also primitive.
> I conjecture that an extended patch sequence that starts and ends at
> primitive contexts is just a sequence of primitive patches with some
> extra loops (involving patches that cancel each other out).
> 
> If I wanted to implement it, I think it would just become this:
> 
> * A repository consists of two things:
>   * A sequence S of primitive patches with distinct names, starting at
> O, with no inverses (i.e. only positive names).
>   * A set of names, called "tombstones", representing "deleted"
> patches. These names don't appear in S.

The problem I see here is that this looses the start (or ending) state
of the "deleted" patches. And you don't even remember their content,
just their name. But even supposing you remember a start state and the
patch representation of every deleted patch, this will still mess up
your commutation. Because deleting inverse pairs AA^ from the main
sequence basically means the same as stating that such a pair commutes
with any other patch, which is clearly wrong.

> * Pulling from another repository works as follows:
>   a) First, add the tombstones from the other repo to this repo's set
> of tombstones. If any of the patches in this repo's S now have
> tombstones, delete them from S. If any patches in S depended on those,
> delete them as well, and add them to the set of tombstones, and add
> conflict markers into the working directory.
>   b) Next, pull in the primitive patches from the other repo. This
> works just like it does in darcs (1 2 or 3; as far as I know it
> doesn't matter) except for two differences. First, any time darcs
> would have introduced a conflictor patch, we instead delete all
> patches involved in the conflict, add tombstones for the conflicting
> patches, and add conflict markers in the working directory as usual.
> Second, we have to deal with patches in the remote repo that depend on
> patches deleted in this repo; this should work something like step a).

In Darcs there is a notion of resolving a conflict by recording a new
patch that depends on the conflict. Such a conflict is no longer shown
as a conflict. It may be that your approach achieves something similar
but it is not yet clear to me how.

Cheers
Ben