[darcs-users] Interesting article about how Darcs does the merge better than the most other DVCS

Stephen J. Turnbull stephen at xemacs.org
Sat Apr 30 18:01:56 UTC 2011


Michael Olney writes:

 > The article points out a case where Git can produce multiple
 > results (which IMO would ideally trigger a manual resolution), but
 > does not demonstrate that Darcs does any better in general.

It's easy to give examples where any existing VCS produces a
syntactically correct program that is semantically wrong by merging
two semantically correct patches, and they don't even depend on the
"changes in different files are guaranteed independent" heuristic.  So
there just is no real substitute for review of every merge anyway, and
you can't say "VCS X is correct so the right thing to do is to improve
VCS X."

Nor has anyone yet come up with a theory of VCS that nests the
imperfect duality of git and darcs.  Git (and other DAG-based VCSes)
take revisions as primitive; arcs in the universal revision graph are
lifted into arcs in the history DAG.  Then git defines merge in terms
of merging revisions.  Although the merge need not be semantically or
even syntactically correct (in fact, often is extremely bogus due
to conflict markers), there is always a revision as a result.  In this
sense it is algebraically complete.

OTOH, Darcs takes patches as primitive and derives revisions as paths
of patches, but this is *not* node/arc-dual to the git model.  A Darcs
patch does not correspond to an arc in the revision graph for two
reasons.  First, some patches are considered to be primitive, while
others have non-trivial structure as composites of primitive patches.
All arcs in git are primitive, although of course they can be
composed.  Second, a Darcs patch is not a single arc in the revision
graph, but rather a set of "parallel"[1] arcs.  In patch theory (some)
patches are taken to be primitive and revisions are partially-ordered
sets of patches.  Worse, a patch is often considered to have
restricted domain (you can't apply it to every revision).  In the
theory, this means that the Darcs merge operator is not, and cannot
be, algebraically complete within the class of revisions; some merges
of revisions (posets of patches) result in conflictors, not revisions.

These are *different* theories, and making them comparable is not
trivial.

All this leaves advocates of Darcs who claim superiority based on
patch theory, or simply on the idea of taking patches as the primitive
notion, in a very uncomfortable place.  Any criticism of git from this
point of view can be rebutted simply with "You can't run a conflictor
or even a poset of patches, only a revision.  I care about revisions,
and git manages them nicely for me, ThankYouVeryMuch!"

 > > http://r6.ca/blog/20110416T204742Z.html

and http://bramcohen.livejournal.com/74462.html both get this wrong.
I'm somewhat surprised that Bram Cohen gets it wrong.  The point is
that "merge associativity" is a patch-based desideratum that simply
doesn't apply directly to a revision-based model.  The revisions being
merged are different, so the merge is different -- why are you so
surprised?<wink/>

           To unpack that a bit, in the graph at left (with 1 as
    1      the tip commit) if the dotted merge to * is present,
    |\     then I would presume that A is a "complete" changeset,
    | \    and the gatekeeper deliberately chose to merge that.
    *  B   In that case git gets the same answer as Darcs (IIUC).
    |. |   But if there is no merge at *, then A+B is a single
    | .|   changeset as far as the merge is concerned, and I see
    C  A   no good reason to choose one over the other -- it would
    | /    depend on the intent of the merging developer, whether
    |/     she perceives 0->A and A->B as separate changes or not.
    0      History matters!  (BTW, what does Darcs do if 0->A and
           A->B are coalesced into a single patch, 0->B?)

I also think it's arguable that having multiple merge strategies which
give different answers is useful.  While applying the same merge
strategy to every change in a changeset is merely a heuristic, it can
be useful to the developer, who can choose a different strategy and
reduce the work needed.  For example, just yesterday "git merge
--strategy=recursive --strategy-option=patience" saved me hours
vs. the default "git merge" when updating a lightly-modified server
to upstream after a year of fairly active upstream development.  (That
could easily just be a matter of presentation of the conflicts, but
not so for the "ours" and "theirs" strategy options; those will
definitely produce different results in the case of conflicts.)

To me, the superiority of Darcs rests on two pragmatic considerations.
It fits the programmer's mental model of what's happening (abstractly
expressed as "patches *add* features or fix defects" vs. git's
"patches *replace* a version with a better one"), and it has a nice
UI.  But not all programmers like this mental model, and some have
higher priorities than the parts of the UI that Darcs does well.

Vive la difference!

P.S. I don't find Zooko's example, or the "merge associativity law"
for that matter, convincing.  I agree with Zooko that in the case of
his C program Darcs gets it right and git, wrong.  But that's clear
from the context.  In the abstract case, I just don't see it at all.
And in the Backes example, while I'm not sure why git gets it right
(pure luck, I imagine), Darcs gets it wrong because it does the wrong
thing.  It literally doesn't see the forest (context lines) in its
concentration on the tree (changed line).  The context makes it clear
that the *function block* has moved -- concentrating on *hunk changes
at locations* is just wrong in the presence of block moves.  Larry
Wall's patch couldn't miss this, for example.


Footnotes: 
[1]  "Parallel" patches are often defined as diffs computed against
the same first revision, but here I mean two diffs starting from
different revisions that represent "making the same changes".


More information about the darcs-users mailing list