[darcs-users] so long and thanks for all the darcs

Stephen J. Turnbull stephen at xemacs.org
Fri Apr 13 08:19:40 UTC 2018


I think we converged substantially in this round!

Benjamin Franksen writes:
 > On 04/10/2018 08:34 AM, Stephen J. Turnbull wrote:

 > > Any user who understands what a ref is will say "a Darcs tag is
 > > too a ref!" I think.
 > 
 > Perhaps (but you won't, right?).

I would, in the sense that it is a name that allows you to rebuild a
version exactly, just as a git tag or branch does.  It's not a ref
into a DAG, of course.

 > > How do you identify "official"?
 > 
 > I can't, unless it's an "official" repo to start with (e.g.
 > http://darcs.net/) and then I would assume that all branches are
 > "official" (assuming darcs had branches).

This is generally not true with git.  In corporate situations,
including large volunteer projects like Python or GHC, it probably is
true.  But in cases of smaller projects, or even projects with formal
organizations that translated repos from centralized systems where
public branches were an important form of communication, I would
expect a lot of detritus.

Also, many projects make official "release branches".  Python has
several score by now.  In the Mercurial days, each was a separate
repo, but in git there's been substantial merging.  I'm not sure if
they've *all* been aggregated into one repo, but the backports policy
suggests they might have, for convenience in cherry-picking.

 > > I doubt they'd be willing to make "export all branches on clone"
 > > a default, and it's not clear to me that the "I just want to see
 > > the mainline" aren't the majority.
 > 
 > How do they identify "the mainline"?

To the folks who just want the VCS to stay out of their way, it's
"whatever $VCS clone scheme://project.org/official checks out."

You mentioned "familiar and comfortable with Darcs".  I don't think
"comfortable" implies "familiar" (in the sense of how the internals
work and how it differs from other VCSes the user may be comfortable
with).  I think it means (to most users) that the VCS stays out of
their way.

 > Indeed. At work we use Darcs for development of several medium sized
 > control systems for scientific instruments.

Interesting to see that description.  Sounds like what I would expect
(notwithstanding the unfortunate experiment with git submodules, that
kind of thing happens in the best-run organizations).

 > > This means that from an
 > > individual developer's point of view, the state of master is a triple:
 > > (1) what's actually in the official repo (unknown; another dev may
 > >     have updated),
 > 
 > True. (Though it is easy to check if this is the case (hg incoming,
 > darcs pull --dry-run, git <whatever>)).

Your network is not run by the "MIT of Japan" (my employer, where the
abbreviation *really* expands to "minimally informed technicians"),
nor is it inside the Great Firewall (currently GitHub is blocked in
China, I am informed).  And it mostly matters in the last five minutes
before a feature freeze. ;-)

 > > This makes me unsure what you think the questions we discuss are.
 > 
 > I am unsure myself because I have lost some of the context.

Great minds think, or not :-), alike!

 > > I don't think this comparison is entirely accurate.  All DAG-based
 > > systems permit cherry-picking and rebasing, although those like
 > > Mercurial and Bazaar do try to deprecate rebase.  In git they are
 > > first-class operations.
 > 
 > Cherry-picking is an attempt to get the effect of patch commutation
 > without paying the price. You get what you pay for: an ad-hoc solution
 > that may or may not give you the results you expect.

You are willing to say that in public after denying that Darcs has, or
you even want, a semantic patch theory? ;-)

 > Making sure the results are what you expect is tedious and error
 > prone and I understand if people are nervous about it ("untested
 > versions, gaah!").

Every Darcs repo implies a number of untested versions which is
potentially exponential in the number of patches.  I have no idea in
practice how many versions are typically generated by repeated
obliteration respecting dependencies, but I imagine it's way larger
than the number of versions actually subjected to formal testing.  (I
would guess properly tested versions are approximately linear in the
number of patches).

 > > does patch algebra allow you to avoid some conflicts that would
 > > occur in a DAG-based system?
 > 
 > Some, yes. It depends a lot on the foundation i.e. the concrete
 > implementation of your patch algebra. It also depends on how conflicts
 > are detected in the DAG based system.

I don't know of any DAG-based systems with a substantial advance over
patch(1).

 > But that is not the main point. The main point is that the patch
 > algebra frees you from having to worry about history, /except/ when
 > it is relevant, i.e. when patches have dependencies.

But isn't that costly when you are trying to localize a bug by testing
which versions exhibit it?  When bisection works in a DAG-based
system, you have a logarithmic upper bound on search time.  (Also when
it fails, you find out in logarithmic time.)  It's not obvious to me
that you get that result in Darcs since its "mainline" is
fundamentally nonlinear.

 > > If not, what is the great advantage of patch algebra from your
 > > point of view?  Is it related to the ability to claim the same
 > > branch identity for two workspaces that "haven't diverged too
 > > much", where a git rebase in a published branch all too often
 > > results in an unusable mess of conflicts?
 > 
 > Well, my experience tells me that "an unusable mess of conflicts" can
 > happen with Darcs in just the same way.

I don't think it's "just the same way".  My point is that a rebase
changes the "identity" of a branch in a nonlinear way because it's
version-based.  In Darcs (at least in theory) you can walk forward
applying the patches and fixing conflicts one patch at a time.  (I
guess this is exactly what "darcs rebase" implements.)  True, in Darcs
a megapatch can do you in, but *every* git rebase is a megapatch!

 > When i pull a patch from your repo and it doesn't conflict, I have
 > enlarged the intersection and reduced the (symmetric)
 > difference. When I repeat this, and also push, and everything
 > merges cleanly, then our repos are semantically identical,
 > period. I just don't have to care about the order, either one is
 > fine.

This is a useful explanation!

 > > [In scaling Darcs, s]torage blows up, the naming conflicts will
 > > be frequent unless you're willing to endure network outages and
 > > delays, and URLs for personal repos are often long and/or
 > > unintuitive.
 > 
 > Yes, storage blow-up is a problem, and another one is discoverability,
 > which is why I want to add branches to Darcs.
 > 
 > I don't understand what you mean with "naming conflicts will be
 > frequent".

Names like "test", "new", and in some cases feature names are likely
to be used independently across personal repos.

 > > This is what happens in git now, except that you are able to set
 > > your own defaults in .git/config, and provide aliases for URLs
 > > (the remotes).  You can argue that remotes provide more confusion
 > > than convenience if you like, but several years of experience
 > > have shown that for the vast majority of git users it's the other
 > > way around.
 > 
 > You sound so confident when you say that. As if the git we have
 > today was the result of incorporating years of user feedback. OTOH
 > you keep telling me that git is the way it is because the
 > developers have mde and still make it for their own good,
 > primarily. And that the UI is more or less frozen because
 > Mr. T. said so many years ago.

There's no conflict between itch-scratching and Mr. T's decrees on one
hand, and general user satisfaction with the remote feature on the
other.  Unless you're doing something tricky, the workflow in most
projects is pretty simple: go to GitHub, fork the official repo to
your account on GitHub, clone your fork to your workstation, make
branches for each "piece of work" (defined by the project leadership),
push them to your fork when done and submit a "pull request".
Management of remotes in this scenario is completely transparent to
the ordinary contributor: "clone" does all the work.

It would work the same way with a gated corporate repo: the
gatekeepers need to know what they're doing, but to the ordinary dev,
"remote management" simply amounts to clone, push, and pull with
arguments defaulted.

 > > This is not true for branches.  "Colocated branches" (ie, the many
 > > branches per repo model) do seem to cause confusion.  My guess is that
 > > a Darcs-with-branches would have the same problem.
 > 
 > I hope we can avoid that.

Perhaps you can.  It will depend on how many users with a "centralized
VCS" mindset you attract.  I'm not sure of whether that mindset is
"organic", or whether it's a matter of experience with centralized
systems.  (The canonical example is Richard "I'm a genius hacker and
I've always committed directly to the production repo" Stallman, who
obviously had decades of experience with RCS and CVS before Emacs
switched to Bazaar, and then git.  As people who grew up with DVCS
become the overwhelming majority, perhaps that mindset will just
f-f-f-fade away, as Peter Townsend sang.)

 > > In context, "short-lived deviation" is exactly the sense I meant:
 > > in case of a merge with way too many conflicts, you want to
 > > "rollback" to the pre-merge state.
 > 
 > But doesn't this loose the changes you made?

Which changes?  First, there should be no uncommitted changes in the
workspace when the merge is started.  If there are, commit them
(perhaps to another branch).  Second, if you've fixed a few files
before discovering the mess, you can commit them separately to an
appropriate branch (usually your mainline).  You'll have to redo to
the merge, but for those files you always choose your existing fixed
version.

Perhaps it's not as good as it could be but you don't need to lose
work.  I grant that this is *not* the image you would get from
"rollback to the premerge state", but in my experience it's usually
pretty obvious when you've got a mess before trying to fix it, so
that's the majority of cases anyway.

 > In the situation where I have complicated conflicts, I usually use
 > 'darcs rebase' to resolve them one patch at a time. The work-flow is
 > like this: you say 'darcs rebase pull', which suspends any local patches
 > that conflict with remote ones.  [....]

 > My experience is that it is much easier to resolve complicated conflicts
 > in this step-wise fashion.

This sounds like the optimization I obliquely referred to above.

 > If you had unrecorded changes you are out of luck:

There's no good reason for having unrecorded changes in any of the
DAG-based systems.  They all provide stash or something like it.  I
can't see any reason for it in Darcs, either, a record followed by an
immediate reversion patch is effectively a stash, if Darcs doesn't
already have that feature.

 > > Sigh.  This simply isn't true.  *The DAG is immutable.* 
 > 
 > Ah, I never doubted that the DAG remains consistent in itself. What I
 > meant is the consistency of the changes to your tree. For instance, if
 > you use cherry-picking to re-order changes, can you be sure that after
 > picking all the commits in a branch the resulting tree will be the same
 > as in the original? I don't think so.

You can be sure in the same circumstances as in Darcs: when the
cherry-picking involves no manual resolution of conflicts.

 > Assuming a modernized version of Darcs with in-repo branches,
 > better (guaranteed to be efficient i.e. polynomial, ideally linear)
 > conflict handling, and a more efficient representation of binary
 > hunks: yes, I think [managing the Linux kernel or GCC with Darcs]
 > would be possible and would actually work better than git.

Good luck!  I hope you have the time and the help to get there.  (I
don't have time to learn enough Haskell for the foreseeable future.)

 > I still find it interesting that in Darcs I never missed remote tracking
 > branches yet.

I don't see why you would, since Darcs forces you to manage it
manually anyway.  That is, the only way you can keep a mirror of the
"official" repository's state is by keeping a pristine repository, as
you describe below.[1]  Keeping "pristines" is the way I have
historically managed my Mercurial and Bazaar projects, and still do
for those projects still using Mercurial (all my remaining Bazaar
projects are now sufficiently stable that I just work in the pristine
for my decennial patches ;-).

Otherwise, you just depend on network connectivity, and pull directly
into the working copy, or diff against it.

 > I guess the work-flow with Darcs is just different enough that some
 > concepts (or problems) simply do not transfer naturally.

I think so.  I think some of them will arise in a multibranch version
of Darcs, though.

 > > No, that default is only for a clone, and it's whatever is
 > > checked out in the source repo, which is usually "master" for a
 > > public repo.
 > 
 > But this is horrible.

Not in practice. :-)

 > "Whatever is checked out in the source repo" is completely
 > unpredictable (unless you make sure it is a bare repo so nobody
 > would checkout anything there).

There still needs to be a HEAD (which is what determines what is
checked out).  In any formalized workflow, it will be a bare repo, so
I'm not sure you would experience any problem.

 > >  > What about the sharing with colleagues? [...]  You really want
 > >  > a third repo in between upstream and local for that.
 > > 
 > > Yes, as I describe above these days it's typically on GitHub.
 > 
 > Unacceptable in many companies. Also unnecessarily slow, etc etc.

Sure, but it's trivial create one in-house: any git repo reachable by
network will do.  Maintaining and managing that is *non*-trivial;
that's why GitHub is so successful, they're darn good at automating
that stuff.  But it's not *that* hard to create a reasonable workflow,
easier to teach it, and only the gatekeepers need to know the
necessary operations for acquiring and merging contributions.

 > > I guess; in practice on GitHub you can't work in it.  I suppose
 > > setting it up as a bare repo does help prevent "wrong cwd" boo-boos.
 > 
 > And clones where you get whatever branch the developer has just
 > checked out.

Sure, *if* you're working peer-to-peer.  If your organization is
pretty small, you're also probably in frequent enough communication,
and familiar enough with what each of you is doing that it's just not
a problem.  If it's larger, see "in-house intermediate repo", above.

 > Let's drop [the discussion of what's a URI] and agree to disagree.

OK, but remember you're also disagreeing with RFC 3986. :^)

 > >  > You said earlier that git represents a submodule as a tree object
 > >  > that is itself a commit. But it cannot be the commit that
 > >  > represents the current (pristine) tree in the submodule, else I
 > >  > could not make a commit in the submodule (or pull there) without
 > >  > makeing a commit in the containing repo/branch.
 > > 
 > > I'm not sure what you mean by this.
 > 
 > I am trying to understand how submodules work in git. So I have a subdir
 > "bar". The tree referenced by the current commit (of the supermodule)
 > has an entry for "bar" and its content object is not a file but another
 > commit. So suppose I pull a different commit inside the submodule. Would
 > that not mean that the supermodule needs to change, too, i.e. refer to
 > this new commit instead of the old one? But that cannot be, since the
 > commit of the supermodule is immutable.... ahh, I think I do understand:
 > git will show me this update as an uncommitted change! I can commit it
 > in the supermodule and then it "officially" refers to this new commit of
 > the submodule. Correct?

Exactly.  I have not needed to resolve this issue, so I don't know
what the current thinking is.  The last time I looked, basically
everybody said "it's case-by-case, you need to look where you are and
where you want to be, and that may not even be current upstream,
so...."

For example, the one time I needed to deal with a submodule in anger,
I made a change in the external project's code, committed it to a
branch in the subrepo which I left checked out, and sent a PR.  In
the meantime, I did not update the parent project, and ignored the
occasional warnings about the discrepancy.  When my PR got refused I
checked the original master in the submodule out, worked around the
issue in the parent project, and that repo remains in that state
(politically orphaned unchecked out branch and all) to this day.

 > Thanks, I think this make more sense now.

It's really complicated.  This is one of those features where "if you
don't know (1) *why* you need it (what specific workflow issues it
addresses) *and* (2) *how* you will modify your workflow to address
those issues using this feature, YOU DO NOT NEED IT and YOU WILL BE
SORRY if you try it anyway." :-)

 > There are some open questions with regard to the design [of "ghost
 > objects"] but I don't think this is the right place to discuss
 > them.

Nope.  I appreciate the hints you dropped, though.

Regards,
Steve

Footnotes: 
[1]  Theoretically you could use tags, but that would be difficult in
Darcs without cooperation from the official repo, AFAICS.




More information about the darcs-users mailing list