[darcs-users] so long and thanks for all the darcs

Ben Franksen ben.franksen at online.de
Wed Mar 7 13:40:13 UTC 2018


Am 05.03.2018 um 04:40 schrieb Stephen J. Turnbull:
> Executive summary:
> 
> Darcs is an elegant system with a very simple underlying repository
> model (set of patches) and an implementation that is faithful to that
> model.  This makes Darcs easy to understand and use.
> 
> Although git and Mercurial (and Bazaar) share a repository model that
> is somewhat more complex (DAG of versions), only git's implementation
> is faithful with an immutable representation of history.  Mercurial's
> and Bazaar's implementations violate faithfulness and immutability, so
> their designers chose to restrict the operations available for safety
> and simplicity.

Do you have any references supporting this assertion? It is the first
time I have seen it made and are somewhat perplexed (and intrigued).

(edit: you do explain most of it below)

>  Paradoxically, perhaps, I believe this is why git
> dominates the revision control landscape today despite its widely
> reviled UI.

An interesting thought.

>  (I very briefly address the social process that led to
> this outcome.)
> 
> I have two questions about current darcs:
> 
> 1.  How do people handle configuration management?  Just keep separate
> repos for every configuration of the software to be distributed?

Yes, more or less. For small differences I sometimes make a patch with a
'LOCAL DONT PUSH configuration adapted bla bla' prefix and keep it in
the same repo. Such a patch usually commutes with the vast majority of
the development patches, but you have to be careful not to push it by
accident.

It would be nice if I could tag such patches more explicitly as "don't
push this when pushing (and don't pull when being pulled)", at least by
default (i.e. overridable with a switch). You can emulate this by
keeping a copy of the repo with this patch and using --intersection
<other-repo> when pushing but that is awkward to use.

> 2.  What features or extensions for forensic work (localization) are
> available for Darcs?  I'm thinking about features like bisect commands
> (common in DAG-based systems) and git's "pickaxe".

We have the 'darcs test' command. I believe it is similar to git's
bisect, see 'darcs test --help' for details. I don't use it often
because usually I am faster (less unneeded compilation time) with the
manual process (obliterate in big chunks, run test, repeat until
successful, pull in somewhat smaller chunks, etc). If you go far enough
back in time, compilation can fail due to changes in build-dependencies
or the tool chain and such problems require manual interaction anyway.

> On to the tl;dr part.  I ended up having to work through a lot of
> stuff to figure out what I think about the various VCSes, and record
> it here for those interested in understanding why git seems so wrong-
> headed to many users, and yet persists in its idiosyncratic ways.  I'm
> sure there will be many points of disagreement, but I hope it will be
> useful.
> 
> Ben Franksen writes:
>  > Am 04.03.2018 um 05:03 schrieb Karl O. Pinc:
> 
>  > > It's not that the command line interface is more sensible.  It's
>  > > that the mental model of a repo with which the mercurial commands
>  > > interact is simple.
> 
> Unfortunately, that just doesn't hold up.  Mercurial repos are subject
> to a whole bunch of constraints that you need to understand to deal
> with its edge cases or to extend its operations.  And some of it is
> just plain ugly (tags are commits, for example).

I found this out the hard way. If you clone a tag you don't get the
patch that sets the tag, so you are one patch behind. This is just
idiotic (IMHO) and should be fixed.

In darcs tags are also just patches, but if you pull a tag you get the
tag itself, too (since this is how it works: a tag artificially depends
on all patches present when you create it).

>  It's DAG-based, but
> there's a lot of other crap there.  A *correct* mental model of git is
> extremely simple, almost as simple as Darcs.  (Hint: without merges,
> it's just a singly-linked list, like the fundamental structure in
> Lisp.)  The problem with git is that people like to reify branches (cf
> Stephane's comments about branch-per-repo, also Mercurial's named
> branch misfeature), and that is incompatible with what actually
> happens in git, where a "branch" is just a name for the head of a
> chain of commits.  git does *nothing* to cater to this natural
> instinct.

I must say I don't really understand what you are saying. What does
"people like to reify branches" mean, exactly?

If you have a DAG "without merges" then it becomes a (inverted) tree and
every leaf has a unique path from the root to it, a singly-linked list
if you prefer. This should be true whenever you have a DAG of revisions.

> It's very hard to get Darcs' "set of patches" model wrong.  I think
> that's part of the appeal of Darcs.

The "set of patches" model can be misleading. Nowadays I tend to explain
darcs to people in terms of a single sequence of patches, some of which
can be and will be re-ordered (automatically). This is closer to how it
works, internally, and better explains some of the operational behavior
that users see (e.g. patches are presented in a certain order, some
operations are fast, some are not, etc).

> Another part is the appeal to fans of the so-called "user-friendly"
> VCSes that there's very little you can do except move straight ahead,
> but that also means everybody else has to move straight ahead, too.
> This simplifies life dramatically most of the time.

Yes, I see this attitude a lot with (some of) my co-workers.

> But if you look at the evolution of both Mercurial and Bazaar, you'll
> realize that they now have UIs that are as complex as git's to handle,
> with internal complexity to match (and in the case of bzr, an insanely
> baroque internal API).  They're relatively difficult to extend, and
> they don't expose the plumbing, so you have to program extensions, you
> can't script them very easily.  git, on the other hand, internally is
> extremely simple: it's just a universal DAG, no more, no less.  Refs
> are just repo-local variables, with branch refs having the magic
> property of being automatically updated by the commit operation.

Yes. The simplicity of git's "kernel" is appealing.

> So git wins big on both efficiency and extensibility.  Neither really
> appeals to most users: the other VCSes are fast enough in normal use,
> and users want to write their software, not fiddle with their VCS.

Yes. Efficiency and extensibility are the two major weak points of
Darcs. This, apart from a large user base and the usual network effects,
is part of why git attracts developers and Darcs does not. We don't have
a "kernel" with a stable API, it is very much a monolithic thing. This
is bad because it makes building something like darcshub much more
difficult. (There is darcsden, see hub.darcs.net, but it suffers from
poor maintenance partly because of the above and partly because we are
just too few people; as Evan so aptly put it "you can only die on so
many hills at once"...)

>  > There are much more fundamental differences in the model.
> 
> That's right.  The DAG-based VCSes (any system with an essential
> notion of "parent") fundamentally are about managing versions of a
> monolithic object that need have no relation to each other.  In git
> it's possible to completely change the identity of a project
> seamlessly by grafting an arbitrary sequence of versions from a
> different project on to HEAD (using git filter-branch).

I should mention here that the "set" of patches in darcs is also a DAG,
just not /explicitly/ so. The "branching" in this DAG is (mostly, except
we have tags and --ask-deps) fully automatic and in any case fully
determined by the underlying partial order (the dependency relation).
You can look at this DAG nowadays, using 'darcs show dependencies',
though this is of limited practical use since many patches are
independent and thus you get a very "wide" DAG with many "branches".

> Darcs (and other change-based systems like GNU Arch) encourage
> thinking of versions as collections of features, features which can be
> added and subtracted to build structured objects.  To make a radical
> "identity change", there needs to be a patches or patches that does
> the dirty work.

Yes, exactly.

>  > Darcs is the only tool I know of that can automatically re-order
>  > changes without changing patch identities.
> 
> Darcs is the only tool that knows patch identities.

Sigh. I should know better than to write something like that on a public
mailing list. The term "patch identity" is too easily misunderstood if
taken out of the context of the underlying theory.

The point is: Darcs is ("actively", if you will) identifying things that
/are/ different underneath. This is easy to understand from a
mathematical POV, where we are used to do that. The reason darcs can get
away with presenting "the" patch

patch 752a81cb1f26d1ffb440adaefb3049599f5f8d8d
Author: Ben Franksen <ben.franksen at online.de>
Date:   Wed Feb 28 13:26:37 CET 2018
  * harness: add instance Check p => Check (p:>p)

to the user as if it were one immutable thing is the same reason we can
talk about "the" rational number 3/7, even though there is an infinite
number of other representations (6/14, ...). The patch above in reality
is a whole (infinite) class of underlying representations, and we get
away with this precisely because the laws of the underlying patch
algebra state that each representative, if applicable in a given
context, leads to the same result.

So, yes! You are right. Darcs is the only VCS with a notion of patch
identity.

> Even Arch, which
> gave names to patches, didn't allow you to work with sets of patches
> the way Darcs does.
> 
> git rebase certainly automatically reorders changes, but you can't say
> what happens to patch identities because that just isn't a concept
> that git has.

Yes.

> To be honest, as elegant as it is, I really don't miss Darcs' explicit
> and more general capabilities for automatic patch reordering in
> practice.

I miss it (dearly) every time I have to use git or mercurial. YMMV.

>  > Patches that don't depend on each other are freely commuted
>  > whenever necessary or, in certain situations, at the user's behest.
>  > 
>  > I agree with Evan that mercurial's UI is much more sensible than
>  > git's and that their mental models are pretty similar.
> 
> I don't think the second part is true.  It's true that they are closer
> to each other than either is to Darcs, but the difference is still
> huge: in Mercurial branches are first class objects (and there's only
> one per repo in practice), and in git they are not.

Yes, there is a difference when it comes to branches. I am not sure I
know what you mean with "in Mercurial [...] there's only one per repo in
practice".

>  > One thing that makes git unnecessarily complicated to use IMO is
>  > the very loose coupling of local and remote branches. I have never
>  > understood this design decision
> 
> There's that reification again.  git doesn't even have branches as
> understood by Darcs, Mercurial, Bazaar, and Subversion users.  It only
> has commits and refs ("tags").  In the presence of a merge:
> 
>                                         master
>                                   C     |
>                                  / \    V
>                             0---A   D---1
>                                  \ /
>                                   B <- feature
> 
> it's not possible to say whether B or C is on master; both are.

But isn't that exactly the same in Mercurial?

>  Nor
> can you say which originated on the feature branch if that ref is
> deleted.  (It's also impossible to determine where "feature"
> originated: any of 0, A, or B could be the first commit on the
> "feature" branch.)  The same is true of Mercurial in the way most
> people use it (both will be on default).  The difference is that in
> Mercurial "branch" is implemented as a data structure, namely, the
> repo itself.

And I think this is what makes Mercurial easier to use in practice.

>  git, OTOH, allows the fundamental data of the DAG to be
> distributed across multiple object databases.  The user need not know
> or care where an object (commit, tree, or blob) is located.  They just
> need a ref or a SHA1, and git will find it.  (In a sense, you could
> say that there's a single distributed universal repository shared by
> all git projects, similar to the way there's only one "the Internet"!)

This is nice and appropriate for the low-level underlying machinery. It
is not a good model for users when branches are involved. This is my
experience, at least.

In Darcs, all patches in all repos are also "compatible", in principle,
though due to current limitations this is mostly useless in practice
(changes to a file always depend on the file being added).

> In Mercurial, if you use the named branch misfeature, then you *do*
> know which branches B and C are on.  It's a misfeature because in
> Mercurial's design A and D can't be on both branches, which leads to
> hard to diagnose weirdness in bisection and other forensic operations
> when restricted to a named branch.

I would like to know more about these problems.

>  > and see absolutely no sensible use case for giving local branches a
>  > different name than the remote ones or let the user change which
>  > local branch is tracking which remote branch. Mercurial's main
>  > fault (IMO) is that it does not support editing history natively.
> 
> Mercurial can't really afford to do that, because unlike git it
> actually rewrites history; it's built into the storage model, where
> branches are first-class mutable objects.  Bazaar has the same
> problem: it would be really dangerous.  Git never rewrites history,
> because commits are first-class, but immutable.  When people ask git
> to "edit history", what git *does* is to create *new* history, and
> then modify a ref to point to it.  The "rebase problem" is that git
> used to drop the old history on the floor.  (I never had this problem
> myself because I always tagged the old HEAD before rebasing -- now the
> so-called reflog does this automatically, and it was always possible
> to recover names of "loose heads" using git-fsck.  git detractors
> point to this feature as yet another wart in git's UI.)

I certainly find it hard to understand.

> This is why git *must* allow "loose coupling": what git calls "branch"
> is just a ref, and committing to a branch means changing the ref to
> point to a new object.  *There's no structural way to enforce
> "continuity" of such changes* because the ref doesn't remember where
> it used to point, and there's no containing data structure that
> remembers it either.

Hm. I think I begin to understand.

> Of course it's possible for git to use a separate database to remember
> those changes, and it now does (the reflog).  But there's no way at
> all to enforce global uniqueness of branch names: by default every git
> repo has a branch ref named "master", and there's no way to ensure
> that they share history when merging because the ref doesn't know what
> project it points into (the DAG is "universal").  This isn't a problem
> in Mercurial because the branch *object* is the repo itself, which is
> mutable.
> 
> OTOH, it's perfectly reasonable in git to merge two branches named
                                            ^^^^^
> "master" which share only the empty version in their histories into
> the same repo.  I can't think of a good reason to do this, of course.
> But it's possible, and git handles it with ease.  You just need to
> change the name of one of them, and you do need to change the name
> (ie, you need loose coupling).  In Mercurial and Bazaar, it isn't
> possible to do this because of the repo = branch model: you'd have
> unmerged heads floating around, and the only option for handling the
> situation is a merge.
                 ^^^^^

So what? You need a merge in either of them. How does that make a
difference in practice? Do you mean that they do not allow this because
of conflicting branch names? Why not make this conflict resolvable like
any other conflict?

>  And of course in Darcs it's impossible because
> you couldn't physically instantiate a workspace like that (at least
> not if there were any filename collisions), and wouldn't want to.

It is important to realize that this is not a fundamental restriction of
the patch algebra. It is just a deficiency of the concrete set of
underlying ("primitive") patches on which Darcs is (currently) built.

There is an old idea, partly implemented in Darcs but not yet used, for
a different set of primitive patches, where every object in the tree
(file or directory) gets a universally unique ID when created for the
first time. With such a model you can join completely unrelated
repositories easily; the conflicts between objects with the same name
can be resolved simply by renaming one or both of them. In the simplest
case, both trees go into their own new sub-directories at the top.

>  > To do [history-editing] operations [in Mercurial], one is forced to
>  > use low-level extensions like mq that convert changes between
>  > revisions into dumb patch files (the stuff you get with diff -u).
> 
> There are higher-level features.  Strip and the rebase extensions
> allow you to do these things, but they actually modify the DAG, unlike
> git's reset and rebase commands.  Mercurial's versions are very
> dangerous.  Now, with bookmarks you can preserve the DAG with an
> explicit bookmark, but AFAIK Mercurial still doesn't have a feature
> equivalent to git's reflog.

Okay.

> Darcs suffers from this problem as well, IIRC.  Certainly when
> "amend-patch" is used.

It's just 'darcs amend' nowadays. And no, Darcs does not suffer from any
problem in this regard that git doesn't also have.

Amending a patch is problematic insofar as you get a new patch that
looks similar to the old one. This can happen in exactly the same way in
git when you rebase a commit. In both cases there is a way to
distinguish them, though, namely by their "hash", see the "patch
752a81cb1f26d1ffb440adaefb3049599f5f8d8d" in the example patch above.

(Older versions of Darcs did not display this hash, so there could
indeed be confusion as to the identities of a patch and an amended
version of it.)

>  But more generally as well when you remove
> patches.  Obviously, Darcs users are used to modifying history in this
> way; they even expect their VCS to do it automatically and implicitly.
> However, this can be problematic in some "enterprise" scenarios,
> especially with code obtained under restrictive licensing.

I disagree with you here. Removing patches is exactly analogous to
rebase and then delete the ref to the old commit in git. Darcs does
/not/ delete the old patch files, only the reference to them (Darcs
keeps such references in linearly ordered structures called
"inventories", of which --at the moment-- there is just one at the
top-level), except when asked to do so ('darcs optimize clean'; I don't
think I have ever used that command). But, as in git, it is hard to work
with something you have no way to refer to.

It would not be hard to make Darcs follow (modern) git more closely
here, so that by default we keep a reference to patches that aren't
refered to any longer by the "mainline" (the "current" branch). This is
an idea I have been developing lately (see my reply to Karl elsewhere on
this thread).

> Here ends the brain dump. :-)

Thanks! I found it all very interesting. I learned a lot from it (and
from writing down my responses, too).

Cheers
Ben
-- 
"I tend to avoid fiction about dysfunctional urban middle-class people
written in the present tense." -- Ursula K. Le Guin


More information about the darcs-users mailing list