[darcs-users] so long and thanks for all the darcs

Stephen J. Turnbull turnbull at sk.tsukuba.ac.jp
Thu Mar 8 08:52:34 UTC 2018


Another long one.  But we're converging!

Ben Franksen writes:
 > Am 05.03.2018 um 04:40 schrieb Stephen J. Turnbull:

 > > Although git and Mercurial (and Bazaar) share a repository model that
 > > is somewhat more complex (DAG of versions), only git's implementation
 > > is faithful with an immutable representation of history.  Mercurial's
 > > and Bazaar's implementations violate faithfulness and immutability, so
 > > their designers chose to restrict the operations available for safety
 > > and simplicity.
 > 
 > Do you have any references supporting this assertion? It is the first
 > time I have seen it made and are somewhat perplexed (and intrigued).

I don't have a reference to the intent of the designers of Mercurial
and Bazaar as such, but they have frequently made the point that they
are safe, criticized git for permitting dangerous operations, and it's
easy to observe that they're restricted relative to git.

 > > I have two questions about current darcs:

Thank you for your careful answers.

 > We have the 'darcs test' command. I believe it is similar to git's
 > bisect, see 'darcs test --help' for details.

Ah, I did know that, but got hung up on "bisect" which doesn't make
sense as a command name when patches don't always have dependencies or
dependents.

 > In darcs tags are also just patches, but if you pull a tag you get the
 > tag itself, too (since this is how it works: a tag artificially depends
 > on all patches present when you create it).

Yeah, this kind of thing is what I mean by saying Darcs's
implementation is faithful to its mental model.

 > I must say I don't really understand what you are saying. What does
 > "people like to reify branches" mean, exactly?

It means that they think of them as separate entities, backed by some
data structure in the implementation.  For branch-per-repo models, the
repo is that data structure.  In addition, Bazaar has a whole internal
API that knows all about manipulating branches, as opposed to the
history they contain.  Mercurial has this odd branch name property on
commits.

By contrast, in git there's the history DAG and its components
(commits, trees, and blobs) and that's it.  The DAG may have multiple
heads, and what people think of as "the branch I'm working on" may not
even be entirely contained in the repository.  There is no data
structure that surely contains everything you need to know about a
branch, except the whole repository plus any external object databases
configured in the repository's metadata.

 > > It's very hard to get Darcs' "set of patches" model wrong.  I think
 > > that's part of the appeal of Darcs.
 > 
 > The "set of patches" model can be misleading.

You're right.  That was very sloppy of me.  What I meant was a partial
order (with patches being related if one is dependent on the other).
But that's not very useful to people who haven't gone down that
mathematical rabbit hole.  Everybody knows what a sequence is.

 > > Another part is the appeal to fans of the so-called "user-friendly"
 > > VCSes that there's very little you can do except move straight ahead,
 > > but that also means everybody else has to move straight ahead, too.
 > > This simplifies life dramatically most of the time.
 > 
 > Yes, I see this attitude a lot with (some of) my co-workers.

I don't consider this an attitude, but rather a fact.  Specifically,
there are many costs to working on branches: redundant work
accomplishing the same task, arguing which implementation is the
redundant one, merge conflicts, etc.  I think the benefits exceed the
costs by far, but I admit there have been times when I've been
massively frustrated by a colleague pulling the rug out from under my
feature branch.  I'm comfortable with people who have fairly extreme
views in that direction; maybe they had even worse experiences than I
have had!

 > Yes. Efficiency and extensibility are the two major weak points of
 > Darcs. This, apart from a large user base and the usual network
 > effects, is part of why git attracts developers and Darcs does
 > not.

Well, there's also functional programming.  C, for all its faults, is
an imperative language, and people seem to find that easier to grasp.

 > We don't have a "kernel" with a stable API,

git doesn't have a stable API either.  I don't know about the current
developers, but Linus basically promised that the *UI* would be
backward compatible.  Any scripts you've written will continue to
work.  The internal APIs are subject to change, though, and that's why
there's never been a successful refactoring into a "libgit" plus CLI
module.

 > I should mention here that the "set" of patches in darcs is also a DAG,
 > just not /explicitly/ so.

Sure.  The DAG I'm talking about is a completely arbitrary history
DAG.  The dependency DAG of Darcs is a different thing.

 > Yes, there is a difference when it comes to branches. I am not sure I
 > know what you mean with "in Mercurial [...] there's only one per repo in
 > practice".

Mercurial now has "bookmarks", which are equivalent to git "branch
refs" as far as I know.  So you could have multiple branches in a
single repo.  However, the projects I know well that use or did use
Mercurial don't take advantage of that.  If you have multiple
supported versions (eg, Python maintains five for Python 3), they
create a new repo for maintenance patches to each version.

 > > it's not possible to say whether B or C is on master; both are.
 > 
 > But isn't that exactly the same in Mercurial?

If you care, you can use named branches.  I don't know anybody who
actually uses named branches though.

 > > The difference is that in Mercurial "branch" is implemented as a
 > > data structure, namely, the repo itself.
 > 
 > And I think this is what makes Mercurial easier to use in practice.

I don't have a compelling argument, but I don't think that's so.  For
the record, I think the reputation for ease of use comes from (1) a UI
that can be improved over time (Linus basically promised that he
wouldn't break any scripts, so they can add to the UI, but not change
or delete warts), and (2) the difficulty of importing "rebase culture"
to a Mercurial or Bazaar project.  By "rebase culture" I mean the way
many git projects ask contributors to commit early and often, then use
git to squash "uninteresting" commits and then rebase for a linear
history.

 > > git allows the fundamental data of the DAG to be distributed
 > > across multiple object databases[, resulting in a "universal DAG"].
 > 
  > This is nice and appropriate for the low-level underlying
 > machinery. It is not a good model for users when branches are
 > involved. This is my experience, at least.

That's been my experience too.  I don't understand why, though, and
there is a subset of us who have very little trouble with the model
and really appreciate the power and efficiency it provides.

 > In Darcs, all patches in all repos are also "compatible", in principle,
 > though due to current limitations this is mostly useless in practice
 > (changes to a file always depend on the file being added).

Sure, but that's more analogous to "all trees in all git repos".  It's
precisely those dependencies that constrain the possible "branches" in
Darcs, as the DAG of commits defines branches in git.

 > > In Mercurial, if you use the named branch misfeature, then you *do*
 > > know which branches B and C are on.  It's a misfeature because in
 > > Mercurial's design A and D can't be on both branches, which leads to
 > > hard to diagnose weirdness in bisection and other forensic operations
 > > when restricted to a named branch.
 > 
 > I would like to know more about these problems.

It's nothing irremediable, but if you restrict bisection to a named
branch and the problem is created by the merge of that branch into
default, it will tell you "it's all good" (since all commits in the
named branch are older than the merge commit, which is not part of the
named branch).  This makes named branches vastly less useful.  I've
also had WTF moments when I hg log'd the branch and the merge commit I
thought I had done didn't appear.  Maybe it's just me? ;-)

 > I certainly find it hard to understand.

Sure, but I bet you'd find it hard to understand using the various
extensions to Mercurial and Bazaar that enable git-style rebasing.
You think of a Darcs "branch" as a poset of patches (= partially
ordered set of patches = sequence of patches that may be automatically
reordered subject to dependency restrictions), rather than as a
"history of development" as in the VCSes I call "DAG-based".

That's going to cause a lot of cognitive dissonance for you, or at
least some mental gear-grinding.

I like *both* models.  When I'm using git, I commit early and often,
without worrying about creating "coherent changesets" (or something
like that, Tom Lord's term).  Those micro-diffs often tell me
something about what I was thinking at the time when I'm debugging (or
just documenting!)  When publishing, usually I have to squash those,
but that's easy enough with git-rebase --interactive (rebase on the
same commit, and edit the list of commits -- this does require
creating new history, but it causes little or no cognitive
dissonance).

When I was using Darcs, I would use amend to evolve a coherent
changeset (or a few), preferably independent of the other patches that
implement that feature.  But this does tend to cost me history that I
do use occasionally.

 > > OTOH, it's perfectly reasonable in git to merge two branches named
 >                                             ^^^^^

eh.  Nice catch.  This should have been "fetch".

 > So what? You need a merge in either of them.

No, git is perfectly happy to leave as many active heads as you like.
Most of the time you deal only with HEAD, and that implicitly.

 > How does that make a difference in practice? Do you mean that they
 > do not allow this because of conflicting branch names? Why not make
 > this conflict resolvable like any other conflict?

No, Bazaar and original Mercurial make it annoying to work with
multiple heads.  In Bazaar "pull" means "mirror", while all of
"fetch", "pull", and "merge" are implemented with the single command
"merge".  Mercurial complains incessantly if you don't attach a named
branch or bookmark to a loose head.  In both VCSes, loose heads
normally only arise in the case of a conflict, which you resolve by
picking a version and committing it (this is the merge commit).

 > It is important to realize that this is not a fundamental
 > restriction of the patch algebra. It is just a deficiency of the
 > concrete set of underlying ("primitive") patches on which Darcs is
 > (currently) built.

Granted.

 > With such a model you can join completely unrelated repositories
 > easily; the conflicts between objects with the same name can be
 > resolved simply by renaming one or both of them.

My point is that in git, there is no conflict until you explicitly ask
for a merge of the branches.  They can coexist indefinitely.

 > It's just 'darcs amend' nowadays. And no, Darcs does not suffer
 > from any problem in this regard that git doesn't also have.
[...]
 > Removing patches is exactly analogous to rebase and then delete the
 > ref to the old commit in git.

You don't need to delete: rebase by definition *moves* the ref,
leaving no ref to the old commit.  That's "the rebase problem" in a
nutshell.

 > Darcs does /not/ delete the old patch files, only the reference to
 > them (Darcs keeps such references in linearly ordered structures
a > called "inventories", of which --at the moment-- there is just one
 > at the top-level), except when asked to do so ('darcs optimize
 > clean'; I don't think I have ever used that command).

OK, so this is the same as git; only "git gc" ever removes data.

Two questions:

- What happens to the old patch when you do "darcs amend"?

- How does the inventory differ from a tag?

 > But, as in git, it is hard to work with something you have no way
 > to refer to.

One thing that I did before the git reflog was

    ls -lt `git fsck`

which efficiently gives you potential former heads for a rebased
branch.

 > It would not be hard to make Darcs follow (modern) git more closely
 > here, so that by default we keep a reference to patches that aren't
 > refered to any longer by the "mainline" (the "current"
 > branch).

Great minds think alike!

I do have a use case for this that shouldn't offend even the most
ardent fan of repo-per-branch.  The most work I ever did with Darcs
was like 10 years ago.  Darcs has always had great facilities for
patch editing.  XEmacs had this one contributor who was in the habit
of going dark and resurfacing with a megapatch (literally: 35,000
lines and 2MB or so).  So I imported the current version into Darcs
and then applied the patch.  Then I started editing into reviewable
feature patches.  (Yes, I did this because the reviewers refused the
patch and I desperately wanted about 5,000 lines of it. ;-)

I made a lot of mistakes, though, and it would have been useful to be
able to "rewind" to older versions of the patches, and break them
apart differently.

 > > Here ends the brain dump. :-)
 > 
 > Thanks! I found it all very interesting. I learned a lot from it (and
 > from writing down my responses, too).

Funny how that second part works!


More information about the darcs-users mailing list