[darcs-users] Coalescing patches

Thu Sep 24 11:48:16 UTC 2009

Hi Stephen,

Well, I'm still learning lots about the problem space I'm looking at, 
and so far I'm still happy with my original design.

Stephen J. Turnbull wrote:
> Nik writes:
>
> My personal belief is that That Is What
> VCS Is All About. :-)  You should take that with a grain of salt, but
> it will help you to evaluate where I'm coming from.
>   

Ok. The main reason I went with darcs all those years ago was that it 
didn't force me to consider everything as a historical tree as CVS did, 
and I was *so* happy to get away from that. I guess I've always seen the 
historical bent of Mercurial and Git as "the dark side showing through 
again". :o)
> I see why others see your idea as basically a tag.  I don't really
> have an opinion on whether they really differ (no time to think
> carefully).
>   

I replied to this in more detail in another post, but basically a darcs 
tag is a new patch that describes the association of other patches also 
in the repo. My concept of the coalesced (unified?) patch is that it is 
a single patch that *replaces* the other patches it associates.

> (Except of
> course as somebody pointed out "coalesce" is already used internally
> for a different concept, so you need to be careful of that.)
>   

Yup - I'm onto that ... well me and my thesaurus :o)

> This can be addressed (in an inconvenient way) by supplying
> appropriate additional dependencies in your "stick a fork in it"
> patch.  I guess you are seeing your coalesced patch as a *pure* "OK,
> it's done!  Dinner time" patch. 

Why am I now suddenly hungry? :o)

> This seems analogous to the feature
> branch workflow, especially in Bazaar, where the "standard view" will
> show the merge point, but *not* the revisions in the feature branch.
> (Bazaar has a log option so that you can drill down to the feature
> branch revisions, but the default shows only the merge itself.)
>   

Yes, it does sound similar.

> I wonder if a repo (actually, darcs changes and maybe darcs whatsnew)
> should have a notion of "my patches" and "your patches", or even a
> more fine-grained (word for the day!) hierarchical structure
> reflecting the developers' organization.  But at this point that's a
> new thread, I guess.
>   

Interesting ... when you say "developers' organization", do mean 
something like an org-chart of the developers, or more the way a single 
developer organises their repo?

> Well, as I understand it more powerful implementations of a Darcs repo
> are being developed, so that they can contain more than one branch at
> a time.  Then it might be efficient enough to actually switch
> branches.

I'm not a big fan of switching branches.

I do, however, have a separate suggestion for making spontaneous 
branches a little more corporeal, by storing their names somewhere in 
the repo or its metadata, which would yield a number of benefits:

* the user could list the current spontaneous branches
* darcs could avoid creating a new spontaneous branch by mistake when 
the user simply supplies the wrong name, due to a typo or brain-fade.
* the spontaneous branch information could be removed from the patch 
name, and stored in a bespoke location.

>  > If the log messages you created at save time correctly grouped those 
>  > patches to the correct "spontaneous branch", then how much time would 
>  > this save you?
>
> Not much, I think.  Selecting and applying the patches doesn't take
> that much time.  It's cleaning up spurious conflicts (eg, whitespace
> changes that occurred earlier in the session before fixing a typo),
>   

Hmm, good point.
A feature that could help here might be an ability within darcs to 
separate formatting changes from content changes, and save them in 
separate patches, even in response to a single record operation.
The developer can still put those separate patches into the same 
spontaneous branch if they really belong together.

It seems to me that would help a lot - your thoughts?

Of course, such a feature would require that the definition of 
formatting be definable, so that python and possibly Makefiles, were 
handled correctly.

> checking style, initiating a build and test cycle, and reviewing
> results that take up the bulk of the cherrypicking time.

Hmmm, sounds like some or all of this could be pushed off to a CI tool?
The developer marks the patches (or hopefully the spontaneous branches) 
to group, and the CI tool looks for combinations that pass the CI tests 
while the developer has dinner (because it's now ready).
After dinner, the developer reviews the combinations that passed, and 
pushes coalesced (unified) patches from those, or makes appropriate 
changes to try for a better combination.

>   In rebasing,
> yes, it would often help if cherry-picked patches were "subtracted
> from the feature branch", but again that tends to cause spurious
> conflicts.  Darcs might handle these better, I haven't tested.
>   
To the best of my understanding of rebasing, I believe I am trying to 
avoid it with my suggested feature.

> Finally, there's a lot of work in category three that I haven't
> automated because it's delicate (switching branches and stuff like
> that behind the user's back can lead to a very confused DAG!)
> Automating that stuff is where I expect to get efficiency gains.
>
>  > * be able to push the trivial patches without any "cherry-picking" 
>  > effort (because they would all be on a single spontaneous branch)
>
> Ah, it's not that easy for me because I often need to send those
> patches to multiple branches (specifically stable and dev).
>   

The feature as I envisaged it would be able to do exactly that:

* push --unify from my-repo to stable
* push --unify from my-repo to dev, or push unified patch from stable to 
dev.

>  > * would not have any merging or rebasing tasks caused by coalescing, 
>  > because coalescing does not edit the DAG (darcs=dependency DAG; 
>  > git=history DAG).
>
> I'm not so sure about that.  Is Darcs smart enough to realize that
>
> -    # A cmoment indented four spaces.
> +    # A comment indented four spaces.
>
> and
>
> -        # A cmoment indented eight spaces.
> +        # A comment indented eight spaces.
>
> are actually the same patch?  This is the most common kind of conflict
> I ran into with that workflow; git isn't smart enough.
>   

I don't know.

I do know that the kdiff tool I have been using recently (or more 
properly, the diff program that it uses) has a mode that does exactly 
that, and it's perfect for development work.

In what context is this difference or similarity being detected?

I presume the situation that you are talking of is some form of update, 
where a patch in an external repo conflicts with a notionally identical 
content change in the local repo?

The suggestion I made earlier of separating formatting patches from 
content patches would, at worst, consider one of these changes to be two 
patches (one for the whitespace change, and one for the content change), 
and therefore would detect that the content change was identical, and 
the format change different.

> Spontaneous branching is irrelevant; what I mean by "very linear" is
> that there *are* no interesting branches, not even for cherrypicking,
> I just want to coalesce the patches.  This is precisely where the pure
> coalescing would be very efficient, but for me it wasn't a very
> interesting application, I just switched back to my previous workflow.
>   
Ah, ok. Yes, it seems the pure "coalesce" would work perfectly here.
>  > This had led me to a separate train of thought on a "darcs.git" product 
>  > in which the darcs patch theory is used to create and organise the 
>  > patches, but they are stored in teh underlying Git storage. Hopefully 
>  > the best of both worlds...
>
> I've thought about that.  git can generate patches and analyze the DAG
> [...] you would need to equivalence commits that represent the same patch
> starting from different trees.  Sounds messy.
>   

Just to be clear: As I understand it, git is really two separate things 
which now have acquired the same name (confusingly):

1. a high-performance object-storage system;
2. a DCVS built on that high-performance storage system.

I was proposing a combination of darcs' patch theory as the DCVS, using 
only the high-performance object-storage part of git. So there would be 
no git history DAG, no git index, etc. Just darcs patches stored in a 
high-speed object storage system.

With luck this could be implemented by porting the high-level storage 
API within darcs to the git storage system - but I presume that it 
wouldn't quite that easy :o/

Such a combination should allow high-efficiency branches within a repo, 
as per the git dcvs, the file-id-separate-from-pathname feature, the use 
of links, etc. In short, all of the git storage performance and robustness.

What would remain would be darcs' processing overheads, whatever they 
may be. However, for those operations which are particularly slow in 
darcs, I don't know if that is caused by darcs chewing CPU calculating 
patches, or whether it's darcs thrashing a less efficient storage system.

Thanks for your thoughts.

Cheers!
Nik