[darcs-users] darcs conflicts/dependencies -- is patch theory the place to start?
AntC
anthony_clayden at clear.net.nz
Thu Sep 13 06:14:46 UTC 2012
Stephen J. Turnbull <stephen <at> xemacs.org> writes:
>
Thanks you Stephen, you've given me a lot to think about, so here's some
initial reactions ...
>
> > - All the complexity of commutation is exactly to do with sequencing.
> > - I've now learnt there's extra complexity to do with (attempting to) pull
> > duplicate patches. (And how duplicates interact with conflicts.)
> > If a repo were really just a set of patches,
> > managing duplicates would be easy(?).
>
> Unfortunately, I think not. The problem is that unless you know the
> semantics of the document, you can't know what is a duplicate (ie,
> applying it twice results in invalid semantics) and what is a valid
> second patch of the same appearance. For example, consider token
> replace: replace 'foo' with 'bar', merge a branch containing 'foo',
> replace 'foo' with 'bar'. That has obvious and probably correct
> semantics; the second replace is not a duplicate.
>
What I meant by duplicates here is as discussed earlier in the thread by Owen -
- that is, pulling the same patch twice from a source repo.
What I meant by "... would be easy" is:
If the repo I'm pulling in to is just a set, and I pull a patch twice, the VCS
should be able to say: I've already got that, I'll ignore it. But the
algorithm Owen (and Kevin) described seemed to be a lot more difficult, and
got messed up where there were conflicts.
I agree that in your example, the second replace is not a duplicate, because
it wasn't pulled from anywhere.
> You could argue that a patch should include preconditions. ...
Yes, and I did start to explore that idea later in the post. As you go on to
say, the CVS could define the pre-conds so tightly that only one DAG could
ever be valid.
What I'm aiming for, though, is supporting 'cherry picking' -- which I see as
the attractive feature of darcs over monotone, git, etc. (Also
supporting 'undoing' some patch back in history -- which is just another
variety of cherry-picking.)
>
> In the theory, you may want to distinguish between *repo* (at its
> simplest, a [multi]set of patches) and *branch* (repo + semantics for
> constructing a working tree). ...
Having though some more, I'm tending to the view that realistically you have
to do (a bit of) both. L/S/L have a nice section where they say that you can
arbitrarily shuffle the sequence of patches providing you end up with a valid
repo.
That's too idealistic, IMHO, but to support cherry-picking properly, it must
be possible to pull all the patches from a branch in arbitrary order and end
up with the same result (or end up with a conflict).
>
> The examples I've seen require you to know at least the syntax, and
> often the semantics, of the document's language. ...
Good point. VCS's usually apply for text-based files where line breaks are
semantically significant. Contrast that for word-processor documents,
paragraph breaks are significant but auto-wrapped lines are not.
My take on L/S/L is that their line-id and relative sequence pointers are a
mini-relational database to capture the semantics of text-based files.
Arguably there could be different data models for different document
semantics. Perhaps we should be capturing the Abstract Syntax Tree ? ;-)
(And darcs' token replace is a nod to a typical semantically significant
change.)
> >
> > This bites darcs with file moves (renames):
> > you can't detect a move vs. remove-then-add-then-paste-content.
>
> (I still don't understand why anyone cares to detect this difference.
> The two are observationally equivalent in a single Darcs patch. But
> that's just me.)
I was thinking about the AddAdd Makefile example that Owen pointed to. If in
the target repo the file has been moved/renamed, then any hunk operation
should apply to the moved file, not to whatever is currently in the directory
named "Makefile".
>
> > In darcs:
> > - To think of a patch is the *same* even though it morphs through
> > commutation, can we be more precise on what about the patch is
> > the same? (apart from its id)? Something about the sameness of
> > effect from a patch?
>
> That's exactly how it's defined, as the same effect. However, effect
> is difficult to define explicitly, it's like the judge's definition of
> pornography: "I know it when I see it."
So L/S/L have set up a model of effect, in terms of inserts/deletes and
persistent id's for files and lines.
The file GUID that Owen pointed to is along the same lines, for at least the
file structure part of the repo.
>
> > So, how to look at primitive patches?
> > (as identified by a ppid, which might be a GUID):
> > - Aim to represent the effect of the patch in a context-independent
format.
>
> This is hard. For example, the classic example is
>
> - enum foo { foo1=0, foo2, ... fook };
> + enum foo { foo1=0, foo2, ... fook, fool };
>
> - #define FOO_MAX K
> + #define FOO_MAX L
>
> The *programmer's* intended effect is not to replace K by L; ...
I agree it's hard, and there's no generally applicable solution. Perhaps the
minimum to aim for is to detect when something's not right, and help the
programmer work through a resolution. I think this means getting the pre-
conditions tight, but not too tight(!) -- topic for a research project?
>
> > In general, if I'm pulling a bunch of patches,
> > they should stay in the same sequence as in the source repo.
>
> This is not at all obvious. ...
I'm thinking pragmatically: if we don't try to reorder patches, then we don't
risk combinatorial explosion.
And I'm thinking about the mental stress on the programmers (both the one
pulling the patches, and the original author who's going to get asked what
they're about). It's at least going to be easier if patches don't
get 'shuffled' en route.
> ... Finally, one might deliberately
> randomize the order to ensure that order-dependency bugs are
> discovered quickly.
Aha! good point.
>
> > So let's apply the same thinking to the content of a text file: ...
>
> I don't see where you're going with this. This is a nice contribution
> to the L/S/L theory ...
Yes, I meant it only as an idea within L/S/L. As I said, this is a long way
from darcs.
>
> However, the Darcs operation of merging *repos* is already
> conceptually trivial: the union of two sets of patches.
Conceptually trivial maybe. But practically very complex and intractible when
it comes to conflicts and performance. It's really sad that ghc no longer uses
darcs, but I can understand why they turned away. And I haven't seen any
reason why they'd want to come back.
> ... the recursive nature of your representation
> still suggests combinatorial explosion in some cases.
>
I didn't intend my representation of hunk changes to be recursive. (And I
don't think L/S/L's line-id concept is recursive, but their line sequencing is
a linked list.) I'll give it some more thought.
Thank you
AntC
More information about the darcs-users
mailing list