[darcs-devel] Again: conversion to darcs-3

Ben Franksen ben.franksen at online.de
Sun Feb 23 21:25:12 UTC 2020

Here is a summary of my current thinking.

Back when V2 was introduced, we took the "safe, simple and quite
unfriendly" approach: patch identities are lost completely during the
upgrade ('darcs convert darcs-2'), so separately converted branches or
forks of a project cannot ever be merged again. Which meant you had to
throw away all your branches except one, including /every/ piece of
work-in-progress you have lying around somewhere. The only way to avoid
that was to give each branch a unique tag, merge them all into one repo,
convert that, and afterwards restore your separate branches by pulling
(or cloning) the tags. In practice I think very few people actually did
that. For those who tried (I did, for some projects), the probability
that this merging and unmerging hits the exponential nested conflicts
problem is quite high, so you have to be very lucky to get this to
actually finish in finite time and without darcs crashing.

I am adamant that we need to offer users a better story this time!

Ideally users should be able to convert separate branches/forks of a
project separately and independently, retaining the identities
(meta-data) of all patches. Unfortunately, my initial attempts to
implement this failed miserably and the ensuing discussion with Ganesh
convinced me that this goal is unattainable in general. More precisely,
we may still end up with repos that are incompatible in a certain sense
I will detail below.

There is a deeper reason behind these troubles. Basically, while patch
commutation is a local operation, involving only (and requiring only
knowledge about) the patches to be commuted, mutual /compatibility/ of
repos containing /named/ patches depends on global invariants.

Here is the simplest example I can think of to demonstrate that. The
notation 'N:(p1;...)' means a patch with name N and primitive patches

Suppose we have two repos X and Y:

  X = A:(addfile ./f) ; B:(rmfile ./f)

  Y = C:(addfile ./f) ; B:(rmfile ./f)

This is something that is never supposed to happen, and normally it
cannot as Darcs takes care of that. However, due to well known
conceptual bugs in V1 and V2, slightly more complicated examples that
exhibit the same problem can be created using regular Darcs commands.

Conceptually the ./f created by patch A is a "different" file than the
./f created by C. There can only ever be one (named) patch that creates
the "same" file ./f. This means that patch B can only remove the ./f
from either repo X (created by A) or the one from repo Y (created by C),
but not both. So the two repos have differing opinions about which ./f B
is removing, even though both repos are individually consistent and
perfectly valid.

More generally, the invariant here is that any two patches with the same
name B in two repos X and Y must depend on exactly the same set of named
patches. We call this set the /minimal context/ of B.

Both V1 and V2 violate this invariant in certain corner cases. However,
even if that were not the case, a malicious party (or one that saw a
necessity to edit patches below the level of the darcs CLI) could easily
create a sibling repo with a different view on what B depends on and
there is no way for us to decide which is the correct view and which is
wrong. This is why I think we need a more general way to detect and fix
such problems. We should no longer bury our heads in the sand and act
like this simply cannot happen. It can and it will and we must deal with it.

We saw that we cannot decide which repo is the good one and which is the
bad one, because technically both are equally correct. So we must defer
to the user and ask them to make a decision!

The question now is

(a) how to reliably detect such inconsistencies, and
(b) how to fix the repo that the user has decided is broken.

I think (a) is not difficult. While computing minimal contexts is
well-known to be hard, I think we don't have to do that. We just have to
gracefully fail in case any common patches cannot be commuted to the
common trunk, and inform the user about the problem. We display a hint
that points them to another command that can be used to fix one of the
repos, pointing out that it is the user's choice /which/ repo is at
fault and should be fixed.

This brings us to (b). So we have two repos, a good one X and a bad one
Y, as chosen by the user. Since we ar going to modify Y, we assume it is
the "local" repo and X is "remote" (but it could of course be on the
same file system).

The first thing to do is to compute a maximal common trunk. Off that
branches a forking pair of patch sequences with a non-empty
intersection. What I think we should now is quite similar to what
happens when we 'rebase pull' or 'rebase apply': we should /suspend/ all
patches in Y that are also in X, but cannot be commuted to the common
trunk in either repo. The only remaining question is if there is a best
choice for the order in which we do that. I think it may be best to
start with the oldest (in Y) common patch and suspend just that (and
those that depend on it). It may be that we can now commute more of the
remaining common patches to the trunk. We repeat that until there are no
more common patches in the forking sequences.

The user can then unsuspend the suspended patches or rebase obliterate
them or whatever.

Another option might be to just rename any offending patches in Y,
taking care to also rename any references to them. This is probably too
difficult to do for V1 and V2, but for V3 we could do that quite easily.

Last not least, this still leaves open the question of how to convert
patches that contain V2 Duplicates.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: pEpkey.asc
Type: application/pgp-keys
Size: 4211 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-devel/attachments/20200223/9d76231f/attachment.key>

More information about the darcs-devel mailing list