[darcs-users] Coalescing patches

Nik darcs at babel.homelinux.net
Thu Sep 24 01:55:04 UTC 2009


Hi Stephen,

Thanks for your really informative reply.

Stephen J. Turnbull wrote:
> Nik writes:
>
>  > True, git can efficiently edit the history DAG, but with Darcs there is 
>  > (effectively) no history DAG to edit, so the point is moot, isn't
>  > it?
>
> No, it's not.  Search the archives for discussions of "when to tag".
> Darcs does have to keep track of dependencies among patches, those do
> form a DAG, although it may be more sparse than git's history DAG.
> The point about tagging is that once you hit a tag, you can stop
> searching for dependencies.  And editing the DAG in Darcs is not very
> easy.
>   

Agreed, darcs does have a DAG, although as I understand it, it's not a 
history DAG in the way Mercurial or Git have them. Darcs stores 
dependencies, but does not force everything to be sequential. Correct?

More to the point, I was not envisaging the coalesce option as being a 
history editing operation. rather it is a way of handling and 
formalising the association between multiple patches.

My view was that a coalesced patch is more like a patch container: a 
single patch that represents all the changes of the multiple patches it 
contains.

A coalesced patch represents *new* history information, not *changing* 
existing history information.

>  > I assumed that the coalesce was combining a number of smaller patches 
>  > into a single larger one. Darcs has the patch contents and the patch 
>  > dependency information which is sufficient to ensure the coalesced patch 
>  > is correct. Correct?
>
> Well, yes and no.  What if you want to get rid of certain
> dependencies?  It seems to me that the main point of coalescing in the
> naive view is to get rid of the trivial dependencies *among the
> coalesced patches*.  But I think an obvious and important
> generalization is to get rid of spurious dependencies on other
> patches.
>   

Ok, your point is well taken.
I had deliberately tried to keep the coalesce option simple to avoid 
this type of problem.

The main problem I was trying to solve is that of a consumer of my 
shared repo not wanting to know which 15 patches to pull to get 
everything required for a particular feature.

By pushing a coalesced patch to the shared repo, there is now a single, 
(hopefully) well identified, patch in the shared repo that can be pulled 
(and later unpulled) in a single reliable operation.

This should make cherry-picking much more useful for users of that 
shared repo.

My assumption was that if we provide good enough tools to manage really 
fine-grained patches, and good enough tools to categorise those 
fine-grained patches when they are created (or later, interactively), 
then the spurious dependencies should be greatly reduced, or (hopefully) 
eliminated.

>  > So, assuming that the storage model and record semantics are unchanged, 
>  > you don't see any real benefit to darcs users in the suggested change?
>
> It's hard to say.  I've experimented with a workflow where I never
> commit; XEmacs has a call to "git commit"[1] on after-save-hook.
> Later I would go back and edit the DAG to split the autosave branch
> into topics, and coalesce related patches.\
>   

Your information here is really useful!

> What I found was
> (1) I tended to save *much* more often, because "save->fix typo->save"
>     was a really easy idiom to keep extraneous changes separate from
>     the main effort.
>   

yup.

> (2) Since each patch was small, I was generally willing to write a
>     short log for most saves.  (This surprised me.)  It was useful to
>     refer to these in creating coalesced logs.
>   

Interesting - and useful.
I had hoped this was possible, and your comments help confirm this.
Using the darcs concept of the "spontaneous branch", I wanted to rely on 
the log of each save to define the coalescing automatically.

> (3) In software development (mostly maintenance-type work, fixing
>     bugs, code cleanups, etc), I found myself spending about equal
>     amounts of time (a) cherry-picking trivial patches to push
>     immediately, (b) rebasing the "real work", and (c) merging, fixing
>     conflicts and other cleanup (pruning dead branches, etc).
>   

If the log messages you created at save time correctly grouped those 
patches to the correct "spontaneous branch", then how much time would 
this save you?

With these spontaneous branches you would:
* be able to issue a single command to push and coalesce all patches in 
a particular spontaneous branch.
* be able to push the trivial patches without any "cherry-picking" 
effort (because they would all be on a single spontaneous branch)
* would not have any merging or rebasing tasks caused by coalescing, 
because coalescing does not edit the DAG (darcs=dependency DAG; 
git=history DAG).

If you are unfamiliar with the "spontaneous branch", see here:
http://wiki.darcs.net/SpontaneousBranches

> (4) Based on time logs, I was about as productive with this workflow
>     as otherwise, but (a) felt more productive, and (b) probably could
>     have achieved a significant gain with better tools.
>   

I'm hoping that the combination of spontaneous branches and automatic 
coalescing would achieve some or all of that significant gain.

> (5) In writing lectures, etc this workflow was actually a drag because
>     I didn't have very good tools.  (DAG editing was done with shell
>     functions and CLI commands.)  The work was very linear and the
>     burden of compacting the patches was greater than perceived
>     benefits.  With better tools it probably would have been a no-op.
>   

I'm having trouble visualising the issues here.
Would the spontaneous branch and coalesce options help here?
(It seems to me they should.)

> Footnotes: 
> [1]  git was the only one that was fast enough at that time.  All the
> others were slow enough that I found myself saving much less often.  I
> don't know about now.
>   

Yes, I had wondered on this as well.

This had led me to a separate train of thought on a "darcs.git" product 
in which the darcs patch theory is used to create and organise the 
patches, but they are stored in teh underlying Git storage. Hopefully 
the best of both worlds...

Thanks again for your really useful information.

Cheers!
Nik


More information about the darcs-users mailing list