[darcs-users] Coalescing patches

Stephen J. Turnbull stephen at xemacs.org
Thu Sep 24 10:06:17 UTC 2009


Nik writes:

 > Agreed, darcs does have a DAG, although as I understand it, it's not a 
 > history DAG in the way Mercurial or Git have them. Darcs stores 
 > dependencies, but does not force everything to be sequential. Correct?

Well, I'm not an internals hacker.  But I believe you are right.  This
is potentially an efficiency issue if you are going to be manipulating
dependencies among changes.  My personal belief is that That Is What
VCS Is All About. :-)  You should take that with a grain of salt, but
it will help you to evaluate where I'm coming from.

 > More to the point, I was not envisaging the coalesce option as
 > being a history editing operation. rather it is a way of handling
 > and formalising the association between multiple patches.

I see why others see your idea as basically a tag.  I don't really
have an opinion on whether they really differ (no time to think
carefully).

 > A coalesced patch represents *new* history information, not
 > *changing* existing history information.

That is certainly an intuitive definition for "coalesced patch".  What
I'm looking for is somewhat different.  Let's use your definition,
since you actually had the nerve to post your idea. :-)  (Except of
course as somebody pointed out "coalesce" is already used internally
for a different concept, so you need to be careful of that.)

 > Ok, your point is well taken.
 > I had deliberately tried to keep the coalesce option simple to avoid 
 > this type of problem.

That doesn't mean it's not useful as is (although the simpler you make
it the closer it gets to the existing notion of tag, I think).

 > The main problem I was trying to solve is that of a consumer of my
 > shared repo not wanting to know which 15 patches to pull to get
 > everything required for a particular feature.

This can be addressed (in an inconvenient way) by supplying
appropriate additional dependencies in your "stick a fork in it"
patch.  I guess you are seeing your coalesced patch as a *pure* "OK,
it's done!  Dinner time" patch.  This seems analogous to the feature
branch workflow, especially in Bazaar, where the "standard view" will
show the merge point, but *not* the revisions in the feature branch.
(Bazaar has a log option so that you can drill down to the feature
branch revisions, but the default shows only the merge itself.)

 > My assumption was that if we provide good enough tools to manage
 > really fine-grained patches, and good enough tools to categorise
 > those fine-grained patches when they are created (or later,
 > interactively), then the spurious dependencies should be greatly
 > reduced, or (hopefully) eliminated.

I think that's probably true.  However, you have to be careful about
getting too fine-grained.  For example, in a C program you want to
have the prototype change in the mymodule.h file, the definition
change in the mymodule.c file, and the updates to callers in all the
*module.c depending on each other ... but Darcs won't detect that
automatically because they aren't textually proximate.  That could
require the programmer to do a lot of work to group them.

I wonder if a repo (actually, darcs changes and maybe darcs whatsnew)
should have a notion of "my patches" and "your patches", or even a
more fine-grained (word for the day!) hierarchical structure
reflecting the developers' organization.  But at this point that's a
new thread, I guess.

 > Interesting - and useful.
 > I had hoped this was possible, and your comments help confirm this.
 > Using the darcs concept of the "spontaneous branch", I wanted to rely on 
 > the log of each save to define the coalescing automatically.

Well, as I understand it more powerful implementations of a Darcs repo
are being developed, so that they can contain more than one branch at
a time.  Then it might be efficient enough to actually switch
branches.  Of course, the spontaneous branch idea is a good one;
people are good at learning to do things like write "typo fix: " as a
prefix to a log message, not so good (at least, *I* am not) at
remembering to change branches using a VCS command.  In fact, that's
why I developed this workflow.

 > > (3) In software development (mostly maintenance-type work, fixing
 > >     bugs, code cleanups, etc), I found myself spending about equal
 > >     amounts of time (a) cherry-picking trivial patches to push
 > >     immediately, (b) rebasing the "real work", and (c) merging, fixing
 > >     conflicts and other cleanup (pruning dead branches, etc).
 > >   
 > 
 > If the log messages you created at save time correctly grouped those 
 > patches to the correct "spontaneous branch", then how much time would 
 > this save you?

Not much, I think.  Selecting and applying the patches doesn't take
that much time.  It's cleaning up spurious conflicts (eg, whitespace
changes that occurred earlier in the session before fixing a typo),
checking style, initiating a build and test cycle, and reviewing
results that take up the bulk of the cherrypicking time.  In rebasing,
yes, it would often help if cherry-picked patches were "subtracted
from the feature branch", but again that tends to cause spurious
conflicts.  Darcs might handle these better, I haven't tested.
Finally, there's a lot of work in category three that I haven't
automated because it's delicate (switching branches and stuff like
that behind the user's back can lead to a very confused DAG!)
Automating that stuff is where I expect to get efficiency gains.

 > * be able to push the trivial patches without any "cherry-picking" 
 > effort (because they would all be on a single spontaneous branch)

Ah, it's not that easy for me because I often need to send those
patches to multiple branches (specifically stable and dev).

 > * would not have any merging or rebasing tasks caused by coalescing, 
 > because coalescing does not edit the DAG (darcs=dependency DAG; 
 > git=history DAG).

I'm not so sure about that.  Is Darcs smart enough to realize that

-    # A cmoment indented four spaces.
+    # A comment indented four spaces.

and

-        # A cmoment indented eight spaces.
+        # A comment indented eight spaces.

are actually the same patch?  This is the most common kind of conflict
I ran into with that workflow; git isn't smart enough.

 > > (5) In writing lectures, etc this workflow was actually a drag because
 > >     I didn't have very good tools.  (DAG editing was done with shell
 > >     functions and CLI commands.)  The work was very linear and the
 > >     burden of compacting the patches was greater than perceived
 > >     benefits.  With better tools it probably would have been a no-op.
 > 
 > I'm having trouble visualising the issues here.
 > Would the spontaneous branch and coalesce options help here?
 > (It seems to me they should.)

Spontaneous branching is irrelevant; what I mean by "very linear" is
that there *are* no interesting branches, not even for cherrypicking,
I just want to coalesce the patches.  This is precisely where the pure
coalescing would be very efficient, but for me it wasn't a very
interesting application, I just switched back to my previous workflow.

 > This had led me to a separate train of thought on a "darcs.git" product 
 > in which the darcs patch theory is used to create and organise the 
 > patches, but they are stored in teh underlying Git storage. Hopefully 
 > the best of both worlds...

I've thought about that.  git can generate patches and analyze the DAG
really fast by human standards, so in theory you could represent a
patch as a pair of trees, or as a single commit (with the "from tree"
taken from the parent commit).  But you'd need to do all the patch
transformation stuff on the fly, then, and just generating lots of
patches might be a substantial burden.  I don't know how much of that
is cached by Darcs, if a lot, there is a potential problem there.  And
you would need to equivalence commits that represent the same patch
starting from different trees.  Sounds messy.



More information about the darcs-users mailing list