[darcs-users] so long and thanks for all the darcs

Tue Mar 20 12:35:36 UTC 2018

Am 20.03.2018 um 10:33 schrieb Stephen J. Turnbull:
> Ben Franksen writes:
>  > Am 19.03.2018 um 09:12 schrieb Stephen J. Turnbull:
>  > But git chooses to not clone all the refs by default and there is a
>  > reason for that because it would have to pull all the referenced
>  > commits, too,
> 
> As far as I know, git clones the whole object database and sets up one
> tracking branch (which one depends on the checked-out branch in the
> source).  Optionally you can restrict to a single branch by explicitly
> specifying the --single-branch option.  It also copies all refs to
> $GITDIR/refs/remotes/origin, so in fact you have the whole state of
> the cloned repo at the time of cloning.

I was merely paraphrasing what you said earlier:

> I don't think this is possible with raw git on a remote repository.  I
> believe you need to fetch all the remote refs, and query locally.

and trying to find a good reason why this would be the case.

My experience with git tells me that when I make a clone what I get is
/not/ identical to upstream. Instead I get something that is supposed to
be used as a downstream working copy, similarly to how with Subversion
and CVS one works on a "checkout" of the central repository. And clones
of a clone are definitely second class, at least not out of the box. I
know all these problems can be worked around if you know the right
magical incantations but I keep forgetting them because they are so
abstruse.

>  > > So the solution the git developers came up with was providing
>  > > namespaces (called "remotes" in git documentation) so that one
>  > > name could refer to several heads at the same time.
>  > 
>  > This is what I mean. Even following mentally what you wrote here gives
>  > me headaches. Not because of its complexity per se, but because of
>  > *unnecessary* complexity.
> 
> You've already admitted that it's necessary because of name
> collisions.  You just don't like it. :-)

I did not admit anything like that. There are no name collisions if
branch names are always relative to a repo URL. No need for namespacing
branches.

>  > Yeah, discourage the feature, but first add it because it's oh so
>  > cool and cheap, making everyone's live difficult because they now
>  > all have to cope with the resulting complexity.
> 
> I think you misunderstand how git works.

Probably, but I am trying, and I am not stupid (except sometimes).

>  These are just refs, tiny
> files that contain one SHA1 terminated with a newline character, no
> more and no less.

What I said: cheap.

>  People used to do things like name their tracking
> branches <remote>-<branch>, but that had two disadvantages.  One, many
> people did what you seem to find natural, and omit the "<remote>-" part.
> After all, it's not my branch, so I won't work on it, right?

No, this is not what I find natural. What I find natural is that in my
clone the beasts have the same name as in the remote repo from which I
cloned, at least by default.

>  Turns
> out that for various reasons people *do* unintentially commit to those
> branches.

So what? I don't see how this is a problem.

>  Second, whatever the name, you don't want to commit to
> those branches,

Why not?

> so having them in the default namespace is pollution.

Exactly what I was saying: they started with a cheap and harmless
looking feature and then had to introduce lots of complexity to deal
with the consequences.

> The solution is elegant, IMO: move refs to tracking branches to a
> standard (and quite arbitrary) place, and teach the functions that
> have need to know (config, fetch, and push) where that is.
> Furthermore, since they're not in refs/heads anymore, git refuses to
> treat them as branches (checking out a remote ref puts you in
> "detached HEAD" mode).  IOW, since they're just files (and not R/O),
> you can edit them.  But the only way git will change them is on a
> successful fetch or push, to synchronize to the remote.

I guess we'll have to agree to disagree on this point.

>  > > But to get that message you need to explicitly checkout a commit that
>  > > is not the target of a branch ref. 
>  > 
>  > A tag, for instance.
> 
> Yes.  I guess that a lot of people would prefer that git add a bunch
> of implementation complexity so that it would warn when changing
> branches away from a detached HEAD, and only if there were commits to
> that branch or trying to push.  

I am not one of them.

>  > Copy & paste? It's 2018, not the 1970s.
> 
> I frequently drop characters at the beginning or end of a selection
> when using touchpads or handhelds, and occasionally with a mouse.
> Unfortunately git accepts prefixes....  SHA1s are also less than
> mnemonic (at least if your name is not Ramanujan!) -- how do you know
> you've got the right one when you can't query the remote for SHA1
> equivalents to branch refs?  Without the refs, all you would have is a
> bag of dangling heads among all your dangling heads from rebases etc.

I am not against refs but against remotes and how they mangled up both.

>  > > As I understand it, the patch knows about its dependencies, right?
>  > 
>  > Only the explicit ones. The implicit dependencies are, well,
>  > implicit.
> 
> Ouch, see below.
> 
>  > > So if you've been diligent about recording semantic dependencies,
>  > > you should be able to reconstruct the feature the patch helps
>  > > implement.
>  > > 
>  > > Do I have that right?
>  > 
>  > Yes, I think so.
> 
> But apparently requires a bit more luck than I estimated. :-)

Yes, or else you can remember well enough what you did and in which order.

>  > > I suspect you'd have some work to do to even get logging right,
>  > > since that presumably is based on inventories, not on the
>  > > dependency poset.
>  > 
>  > Doesn't matter. The dependencies (if done in this rather un-Darcsy way)
>  > enforce a single linear sequence per branch/repo and the inventory
>  > merely reflects that one fixed order...
> 
> Yes, I understand that.  My point is that you need to do some
> implementation, just as git would have to to emulate patches.

No you do not, as I explained in detail. To make this utterly clear:
Darcs never changes the order of patches in the inventory unless it has
first successfully commuted the patches. This is absolutely necessary
because (1) the commute may fail and (2) the content of the patch
representation may change and so the refs in the inventory have to
change, too.

And git /cannot/ emulate Darcs because the laws of the patch algebra are
/exactly/ what is required to make commutation of changes behave in a
consistent manner.

> I've changed the order of discussion of filter-branch and submodules.
> 
>  > > [Offhand, I can mention] git's filter-branch capabilities
>  > 
>  > I don't know anything about that feature.
> 
> Sort of rebase on steroids.  It allows you to walk all parents and
> automatically do DAG surgery based on arbitrary conditions (presented
> as Bourne shell scripts).  For example, you can split out a
> subdirectory as a separate project without ever checking out a file.

Okay, thanks.

> This would require a lot of implementation to do in terms of posets of
> patches, and it would be like doing surgery with a chainsaw.

Hm, yes, since we do not yet have proper in-repo branches it is hard to
tell what would be need to support this style of history editing.

> OTOH, Darcs might not care since history is not the central thing, as
> long as the patches are right and in some "reasonable" order.
> (Mostly, just respecting dependencies.)  A lot of the things that git
> would do with filter-branch might be implemented in Darcs in terms of
> some similar feature that works directly on patches.

Yes, that is what I was thinking.

> Eg, in the split
> out a subdirectory example, you'd move all the file adds and hunks
> referring to the subdirectory to the new project, and duplicate
> token-replace patches.

The latter is taken care of automatically by the commutation rules. The
sequence

  [replace x y oldpath, move oldpath newpath]

commutes to

  [move oldpath newpath, replace x y newpath]

and similarly for hunks.

>  So probably you *can* do a Darcs-y
> filter-branch pretty efficiently as long as you don't worry too much
> about those parts of history that Darcs treats loosely anyway.

Yes.

> Interesting...
> 
>  > > [and] submodules (ie, attaching a separate repo
>  > > instead of a tree to represent a directory),
>  > 
>  > Yes that's something we do not support yet. Though I'd say the existing
>  > support in git is of the shallow sort.
> 
> I agree that the implementation is trivial in the data structures and
> quite manual in the operations, but I'm not sure what "deep" support
> would be, given the requirements that led to their implementation.

Sure. My comment wasn't mean as a criticism.

>  > IIUC it's more or less a file with some associations between
>  > subdirectories and subrepos (plus some information about their
>  > remotes) and the normal git commands ignore submodules
>  > completely. Correct me if I am wrong.
> 
> Almost correct (for a submodule, its subdirectory is represented in
> the DAG by a commit object rather than a tree object), but there is
> method behind this madness.  Specifically, the point of submodules is
> to create a sort of firewall between the VCS metadata of the main
> project and those of its prerequisites.  Commands normally do not
> recurse into submodules because in most cases those are going to be
> stable, not tracking upstream's bleeding edge, and likely not modified
> in this project, either.

Okay, this makes sense.

>  > BTW, a project I occasionally work on but very often work with (at
>  > work) recently decided to split into several submodules. This led
>  > to general and widespread confusion about how to handle these
>  > submodules and lots of criticism from users and contributors.
> 
> Yeah, submodules *are* complex.  They're also complex in Mercurial
> (called something else).
> 
> Most projects can (and do ;-) avoid them, but when you need them, they
> really make a difference.  They're basically a device to manage
> inter-project, not intra-project, communication.

Yes. Which means such a split really only makes sense if it is
accompanied by a split in the development team. And if changes are
normally contained to one submodule. Both of which weren't the case
here, so I guess it's been an abuse of submodules.

>  > > I'm not sure what submodules would mean in the context
>  > > of Darcs, which doesn't have the concept of tree as far as I know. 
>  > 
>  > Huh? Of course it has. If you mean tree as in "tree of files and dirs
>  > that make up a version". But the question is still a good one and I dont
>  > have an answer ready.
> 
> I mean a single object in the database that describes a tree of files
> and subdirectories.  As I understand it, in Darcs you have patches and
> inventories, and the tree represented by a Darcs repo is implicit in
> the sequential application of patches.

We do have that, even though it is "only" a cache. I mentioned it in
passing, it is called the 'pristine tree' in Darcs. It is a rather
important optimization, otherwise we'd have to reconstruct the tree
every time which would make Darcs extremely slow. (We currently have
only one per repo, but that would have to change when/if we add in-repo
branches).

>  > How does git cope with a conflict between a module and a submodule? 
>  > Say I have a submodule in a directory x and I add a file to the
>  > parent module with name x/y.
> 
> You can't do that.  There is no x subtree in the parent, that is, no
> tree object to add files or subtrees to; rather there is a commit that
> only the submodule command knows how to handle -- it's opaque to the
> mutation commands.  IOW, x *is* a submodule, an independent project
> whose working tree resides at x.  From the point of view of developers
> (and text editors :-), the working tree is just a subtree of the
> parent, but the VCS metadata are completely separate.  You *can* add a
> file x/y to the working tree of the parent, giving a modified x
> submodule.
> 
> You can have a subtree and replace it with a submodule or vice-versa.
> Diffs and things like that will do the right thing.

I think I understand. So if I merge a version that has x/y as a regular
blob and another version where x has been added as a submodule, then I
get a conflict?

> The complexity comes in if you change anything in a submodule.  Then
> you have to make a decision about whether and to which submodule
> branch you want to commit that change, and whether to propagate that
> commit to the parent (remember, the state of the submodule in the
> parent is represented by that commit), and to which branch or branches
> of the parent.  There doesn't seem to be a typical case, so at least
> for now it's entirely up to the user to figure it out, and the UI is
> multistep and therefore errorprone (aka "complex").

Yes. Sound like somehting to avoid unless there is a complelling reason
not to.

>  > > Which reminds me: I've long thought it might be an interesting
>  > > experiment to use git's object database as a backing store for Darcs.
> 
>  > Commuting hunks in Darcs is already fast. We do optimize and handle the
>  > case of hunks in different files quickly (there is no need to change the
>  > patch rep in this case; we say they commute "trivially"). When they are
>  > in the same file, commutation more or less consists of a handful of
>  > comparisons, additions, and substractions with machine integers. The
>  > actual content of what is removed and added is not needed, only the size
>  > (number of lines).
> 
> OK, so that's basically the same "big O", but the representation I
> suggested would involve more overhead, I'm pretty sure.
> 
>  > It is not quite clear to me what the motivation behind this whole idea is.
> 
> Partial git compatibility and faster checkouts and other operations on
> arbitrary known versions.

Ah, yes. This is one of the properties of Darcs that I have observed
makes people nervous: there is no way to recover a certain state (tree)
that your repo had in the past unless you have tagged that state (or
made a clone and stored it somewhere). We have the 'weak hash' nowadays
but it is indeed weak (and xor of all abstract patch hashes). So the
idea is to take the whole chain of inventories at each step (command
that mutates the repo) and version control this file e.g. with git.
Interesting idea.

In fact, you need to do this only with the head inventory, since all the
"closed" parent inventories are immutable* so it is enough to reference
them. Such a thing already exists in Darcs and is called a "context
file". If we strip off the meta data, compress it, calculate a hash sum,
and then store it in some subdirectory under a name that consists of a
(fixed size) counter plus the hash, then we had a way to reliably refer
to any past state, even without involving a foreign VCS such as git.

(*) If such an inventory is opened, mutated, and then stored again, it
gets a new hash sum and thus a new file name. This is exactly the same
as with patches and pristine trees.

Cheers
Ben