[darcs-users] darcs conflicts/dependencies -- is patch theory the place to start?

AntC anthony_clayden at clear.net.nz
Thu Sep 13 06:14:46 UTC 2012


Stephen J. Turnbull <stephen <at> xemacs.org> writes:

> 
Thanks you Stephen, you've given me a lot to think about, so here's some 
initial reactions ...

> 
>  > - All the complexity of commutation is exactly to do with sequencing.
>  > - I've now learnt there's extra complexity to do with (attempting to) pull
>  >   duplicate patches. (And how duplicates interact with conflicts.)
>  >   If a repo were really just a set of patches,
>  >      managing duplicates would be easy(?).
> 
> Unfortunately, I think not.  The problem is that unless you know the
> semantics of the document, you can't know what is a duplicate (ie,
> applying it twice results in invalid semantics) and what is a valid
> second patch of the same appearance.  For example, consider token
> replace: replace 'foo' with 'bar', merge a branch containing 'foo',
> replace 'foo' with 'bar'.  That has obvious and probably correct
> semantics; the second replace is not a duplicate.
> 

What I meant by duplicates here is as discussed earlier in the thread by Owen -
- that is, pulling the same patch twice from a source repo.

What I meant by "... would be easy" is:
If the repo I'm pulling in to is just a set, and I pull a patch twice, the VCS 
should be able to say: I've already got that, I'll ignore it. But the 
algorithm Owen (and Kevin) described seemed to be a lot more difficult, and 
got messed up where there were conflicts.

I agree that in your example, the second replace is not a duplicate, because 
it wasn't pulled from anywhere.


> You could argue that a patch should include preconditions. ...

Yes, and I did start to explore that idea later in the post. As you go on to 
say, the CVS could define the pre-conds so tightly that only one DAG could 
ever be valid.

What I'm aiming for, though, is supporting 'cherry picking' -- which I see as 
the attractive feature of darcs over monotone, git, etc. (Also 
supporting 'undoing' some patch back in history -- which is just another 
variety of cherry-picking.)

> 
> In the theory, you may want to distinguish between *repo* (at its
> simplest, a [multi]set of patches) and *branch* (repo + semantics for
> constructing a working tree). ...

Having though some more, I'm tending to the view that realistically you have 
to do (a bit of) both. L/S/L have a nice section where they say that you can 
arbitrarily shuffle the sequence of patches providing you end up with a valid 
repo.

That's too idealistic, IMHO, but to support cherry-picking properly, it must 
be possible to pull all the patches from a branch in arbitrary order and end 
up with the same result (or end up with a conflict).

> 
> The examples I've seen require you to know at least the syntax, and
> often the semantics, of the document's language. ...

Good point. VCS's usually apply for text-based files where line breaks are 
semantically significant. Contrast that for word-processor documents, 
paragraph breaks are significant but auto-wrapped lines are not.

My take on L/S/L is that their line-id and relative sequence pointers are a 
mini-relational database to capture the semantics of text-based files. 
Arguably there could be different data models for different document 
semantics. Perhaps we should be capturing the Abstract Syntax Tree ? ;-)
(And darcs' token replace is a nod to a typical semantically significant 
change.)


>  > 
>  > This bites darcs with file moves (renames):
>  >    you can't detect a move vs. remove-then-add-then-paste-content.
> 
> (I still don't understand why anyone cares to detect this difference.
> The two are observationally equivalent in a single Darcs patch.  But
> that's just me.)

I was thinking about the AddAdd Makefile example that Owen pointed to. If in 
the target repo the file has been moved/renamed, then any hunk operation 
should apply to the moved file, not to whatever is currently in the directory 
named "Makefile".


> 
>  > In darcs:
>  > - To think of a patch is the *same* even though it morphs through
>  >   commutation, can we be more precise on what about the patch is
>  >   the same?  (apart from its id)?  Something about the sameness of
>  >   effect from a patch?
> 
> That's exactly how it's defined, as the same effect.  However, effect
> is difficult to define explicitly, it's like the judge's definition of
> pornography: "I know it when I see it."

So L/S/L have set up a model of effect, in terms of inserts/deletes and 
persistent id's for files and lines.

The file GUID that Owen pointed to is along the same lines, for at least the 
file structure part of the repo.


> 
>  > So, how to look at primitive patches?
>  > (as identified by a ppid, which might be a GUID):
>  > - Aim to represent the effect of the patch in a context-independent 
format.
> 
> This is hard.  For example, the classic example is
> 
> - enum foo { foo1=0, foo2, ... fook };
> + enum foo { foo1=0, foo2, ... fook, fool };
> 
> - #define FOO_MAX K
> + #define FOO_MAX L
> 
> The *programmer's* intended effect is not to replace K by L; ...

I agree it's hard, and there's no generally applicable solution. Perhaps the 
minimum to aim for is to detect when something's not right, and help the 
programmer work through a resolution. I think this means getting the pre-
conditions tight, but not too tight(!) -- topic for a research project?


> 
>  >              In general, if I'm pulling a bunch of patches,
>  >              they should stay in the same sequence as in the source repo.
> 
> This is not at all obvious.  ...

I'm thinking pragmatically: if we don't try to reorder patches, then we don't 
risk combinatorial explosion.
And I'm thinking about the mental stress on the programmers (both the one 
pulling the patches, and the original author who's going to get asked what 
they're about). It's at least going to be easier if patches don't 
get 'shuffled' en route.


> ...  Finally, one might deliberately
> randomize the order to ensure that order-dependency bugs are
> discovered quickly.

Aha! good point.

> 
>  > So let's apply the same thinking to the content of a text file: ...
> 
> I don't see where you're going with this.  This is a nice contribution
> to the L/S/L theory ...

Yes, I meant it only as an idea within L/S/L. As I said, this is a long way 
from darcs.


> 
> However, the Darcs operation of merging *repos* is already
> conceptually trivial: the union of two sets of patches.

Conceptually trivial maybe. But practically very complex and intractible when 
it comes to conflicts and performance. It's really sad that ghc no longer uses 
darcs, but I can understand why they turned away. And I haven't seen any 
reason why they'd want to come back.

> ... the recursive nature of your representation
> still suggests combinatorial explosion in some cases.
> 

I didn't intend my representation of hunk changes to be recursive. (And I 
don't think L/S/L's line-id concept is recursive, but their line sequencing is 
a linked list.) I'll give it some more thought.

Thank you
AntC




More information about the darcs-users mailing list