[darcs-users] Fwd: Towards a conflict-free revision control system.

Mon Feb 2 18:47:22 UTC 2009

On Sun, Feb 1, 2009 at 7:21 PM, Eric Kow <kowey at darcs.net> wrote:
> Hi,
>
> I've had a chance to sit down and read this.  Here are some superficial
> comments, followed by some questions.
>
> I do like the idea of separating an 'internal' representation of state
> from the 'external' one.  It seems like the darcs pristine cache,
> especially in its hashed form, is an approximation of the internal
> state.  For what it's worth, I also get the impression also that this
> distinction is something that the darcs user manual (and the
> Understanding Darcs wikibook) attempt to make, at least informally.
>
> Unless I am mistaken, the unrecord command in the appear appears to be
> akin to the darcs obliterate/unpull command than to darcs unrecord.  Is
> that intentional?  (Unrecord only changes the internal state, leaving
> the external state untouched; see also revert which does the opposite).

The original intention was to stick to darcs terminology for commands, but
there was obviously some confusion at some point.

> I'm afraid I didn't understand the idea of "weights", nor did I fully
> understand the use of a tree structure to represent the lines in a
> file.  I am dimly aware that representing the file as a list of lines
> will not suffice if we want to have an internal representation that
> allows us to simultaneously represent conflicting states.  But in
> what is the representation tree shaped?  For example, are conflicting
> modifications to a line just different sublines?  Perhaps an example?

Let's take a simple example. Imagine we index lines by rationals for the moment.
The indices are picked randomly by the system as long as they preserve the order
of the user file. The fact that they are rationals ensure that one can
always insert something
between two other things.
We'll see how we can refine this later.

Initial state, each line is a triple (weight, index, contents)

1 1 a
1 2 b
1 3 c

User A changes to:

1 1 a
1 2.1 x
1 3 b

The patch is

-1 2 b
1 2.1 x

User B changes the initial state in his own repo to

1 1 a
1 2.2 y
1 3 c

The patch is

-1 2 b
1 2.2 y

Composing the initial state with both patches yields:

1 1 a
-1 2 b
1 2.1 x
1 2.2 y
1 3 c

The external representation for this merge can be:

a
<<<
b
>>>
x
y
c

Here the "conflict" can be detected because a line has a negative weight.

There are a few catches with proceeding as such.

1. Line indices may clash
2. The notation for representing conflicts is non-standard
3. Merging concurrent insertions at the same point may yield
arbitrary interleavings.

1. Cannot really be fixed with 100% certainty; this is a matter of using the
appropriate precision when picking new indices.
2. To fix this the tool should warn about all potential conflicting
situations, see below...
3. Is fixed by indexing by list of rationals, yielding the tree structure.
(A new node is created to ensure contiguity of a block)

e.g. if user C inserts a big block he could have:

1 1 a
1 2.3;1 u
1 2.3;2 v
1 2.3;3 w
1 3 c

This is only one step of refinement though. The real challenge of
going with this
basis of a "theory" is to find a structure that is efficient enough
and can detect all
type of conflicts that we can reasonably find. (Although as I said
before, this is not
as serious as one might think -- your compiler/testsuite/whatever
should detect conflicts,
because no VC system can detect all conflicts at the semantics level)

>> > Second question.  The paper says:
>> > | Darcs will flag this scenario as a conflict, i.e. it cannot represent such a
>> > | merge. The result of a merge in Darcs is dependent on the order that the
>> > | changes are applied, Darcs will try all possible "non conflicting" permutations
>> > | of the changes which will result, in the worst case, in exponential complexity.
>> >
>> > What does this mean, please?  I thought the whole point of the darcs approach
>> > to conflicts is that it be completely independent of order (i.e. we cancel out
>> > both patches).
>
>> Surely you know more about darcs than us, so we can clear it up
>> together. Note that we refer to darcs 1.0. Isn't it true that the
>> patch representation (potentially) changes after a reordering?
>
> Well, I must warn you that I'm really a patch theory novice and very
> easily in over my head.
>
> I had the understanding that the exponential nature of darcs 1.0 was due
> to a use of nested structures.  That is, if there is a conflict, the
> representation of one of the combatants must be modified to include a
> copy of the patch it is conflicting with.  That means that each conflict
> doubles the size of a patch.  If there are conflicts of conflicts, we
> get yet more doubling and the rest is history.

Ok, I do not understand the details, but I don't think it matters very much
in the end: the fact that the applicability of a patch depends on the context
will necessarily introduce some sort of a search. The problem
boils down to this: let's say you have 'n' patches
in your repository, that potentially conflict with each other. Can they be
applied in an order that produces no conflict (or least conflicts)? I
don't see anyway to do this
other than by exhaustive search. I admit that in most cases we can be
lucky (or have good heuristics) and find it quickly though.

> Darcs 2, if I understand correctly, still uses some amount of nesting,
> but it helps by including enough information to avoid the nesting in the
> most common cases, i.e. with spurious conflicts of conflicts (I don't
> have a precise definition of this, but rather a vague understanding that
> a lot of things that darcs 1.0 would consider a nested conflict are
> actually just by-products of its conflicts resolution, and can be
> avoided).  As a result, the conflict explosion is avoided, except in
> cases of conflict fights, where we have case of a genuinely nested
> conflict.

There seem to be a lack of people understanding this... which is why I
advocate rebuilding on more stable basis.

Cheers,
JP.