[darcs-users] How to extend a patch theory to fully commute

James Cook jcook at cs.berkeley.edu
Wed Jul 1 15:38:51 UTC 2020


On Wed, 1 Jul 2020 at 10:12, Ben Franksen <ben.franksen at online.de> wrote:
> I haven't fully digested everything you wrote but I guess a remark may
> be in order, lest you waste time on something that cannot be implemented
> efficiently. I am not claiming this is equivalent to what you aim at,
> but I think it is at least related.
>
> A well-known and quite natural way to identify the equivalence class of
> all patches that a given patch P can be commuted to, is to use the so
> called "minimal context". A minimal context C(P) for some patch P is a
> sequence of patches starting at the empty state and preceding P, and
> that is minimal in the sense that no subset of it is a valid context. In
> other words, no patch in C(P) can be commuted past P. Another way to
> look at it is to start with a patch P in /any/ context and then commute
> everything you can past P. It is easy to show that this can always be
> done and that the result is unique i.e. you always get the same /set/ of
> patches for the minimal context.
>
> In principle, we could represent patches as they appear in their minimal
> context, together with the set of patches in the minimal context. This
> would be a unique representation which could be stored hashed and thus
> would be fully tamper-proof.
>
> The problem is that computing minimal contexts is prohibitively
> expensive. Nobody has bever come up up with an algorithm that is better
> than O(n^3) in the worst case, where n is the size of the context you
> start with. The reason it is so hard to improve on this is that before
> we can commute two patches, we need to first bring them into proximity.
> Suppose half the patches in the starting context commute past, and the
> other half does not. The patches that do not commute past will
> accumulate before the patch P and all earlier patches you try to commute
> must be commuted past all these accumulated patches, one by one.

Wouldn't that result in O(n^2) commuting operations rather than
O(n^2)? Even so, not nice.

I do think this is related. I'll admit I haven't thought much about
efficiency. My initial impression is that efficiency would be a
problem with the theory as described, but that that could probably be
fixed.

Here are some notes (the last two might not mean much yet, depending
on how deep into the document you've waded).

* If you only ever add primitive patches, and you never commute
patches that would not commute in the primitive theory, then all the
patches keep their representation as primitive patches (technically,
as embedded primitive patches of the form (a, b, [P], [name of P], {},
{}) as described in the "extended patch universe" definition), and
commuting is essentially just primitive commuting. So, efficiency
could only be an issue where conflicts are involved.

* On the other hand, if there are conflicts, converting all of the
patch addresses to "canonical" form (Chapter 3) could cause efficiency
problems. I suspect commuting two patches won't take longer than O(n),
where n is the length of the (Qi) sequences involved (related to how
many patches are involved in the overall conflict) but I'd need to
think about it more.

* I've been assuming a repository will be represented as a sequence of
individual "extended patches". However, there's also a notion of a
"minimal patch sequence address" which could be used to represent the
entire repository. One advantage of the representation is that
permuting patches is trivial: you just permute a stored list of names.
One disadvantage is that if you want to remove a name from the end of
the sequence, you'll need to permute the internally stored sequence
(Qi) so that it ends with that name, and if you wanted to pull from
another similar repo, you'd need to permute one of the (Qi) sequences
so they're compatible with each other. This might end up being similar
to how Darcs already works. (In the document, I talk about "extended
patch sequences". These are the same as minimal patch sequence
addresses, but with an added requirement that (Qi) is kept in
canonical order. You'd want to drop that requirement if you were doing
this, for efficiency reasons.)

* Come to think of it, the "canonical" requirements I put in the
document are probably a bad idea. I think I only put them in so that
the representation of a patch P commuted to a particular starting
context a will always be the same no matter how it got there. I don't
think that's really needed, though; if you want to know whether two
patches are the same, you can just compare their names.

James


More information about the darcs-users mailing list