[darcs-users] conflict misery....

Sat Dec 31 05:40:49 UTC 2005

Okay, I think this has been hashed out before, I think even in a  
thread started by me long ago....  But alas it never made it into the  
Wiki or other permanent store...  So here I go again:

I'm going to describe my current usage model and a "darcs spins  
forever" disaster that it took me all day to deal with.  I'm looking  
for a) comments on the usage model, and b) ideas on how to best  
handle the spinning situation.  I promise, promise, promise that this  
time I'll write both of these things up and put them in the wiki.

I apologize for this being so long, but I want to be very clear about  
what is going on.

=== The Set Up ===

At work, my team uses CVS to manage our source.  Not counting 3rd  
party libraries checked into the tree, there is about 23 Meg of  
source in 2000 files, and another 150 Meg of stuff in a 1000 files.   
For various reasons (like really fast branching), I want to use darcs  
with very frequent checkins for work on my cluster of machines, and  
commit completed work back to CVS every day or so.  Disk space isn't  
an issue (my machines have tons), but speed is.  I want my source  
operations to be fast.  I like to record often, branch often, and  
find out what currently changed often.  These operations are fast in  
darcs and miserably slow in CVS (especially if there are branches.)

I start by checking out some branch of the CVS tree into a base:

[1]	mkdir ~/base
	cd ~/base
	cvs ... checkout -r Branch_Foo product

Then I build a darcs repo on this

[2]	cd ~/base/product
	darcs init
	darcs add -r .
	darcs record

Then I get that into a working area

[3]	mkdir ~/working
	cd ~/working
	darcs get ~/base/product

Now I can work along for awhile:

[4]	cd ~/working/product
	repeat {
	    #edit files
	    darcs record
	}

When I need to update my working area to the latest version of the  
CVS branch (before I can commit my changes to CVS) I do this:

[5]	cd ~/base/product
	cvs update
	darcs record

	cd ~/working/product
	darcs pull

This generally works just fine - even if I commit the sin of not  
recording some outstanding changes in working first.

When my work is ready for committing back to CVS, I do:

[6]	cd ~/base/product
	darcs pull ~/working/product
	cvs commit

Occasionally, I'll add a darcs tag in product when what I've pulled  
from CVS is an important release:

[5a]	cd ~/base/product
	cvs update
	darcs record
	darcs tag

	cd ~/working/product
	darcs pull

=== Questions on the set up ===

Does this seem rational?
Are there some good tricks for making this speedy and fast?
Should I be adding 'darcs tag --checkpoint' and use 'darcs get -- 
partial'?
Should I be doing 'darcs optimize' at some point?

=== The Conflict Misery ===

Yesterday, there is a big shift in which branch will be the one for  
current development.  But, no problem-o I say:  This is just a  
variant of operation [5] above:

[5b]	cd ~/base/product
	cvs update -r Branch_Bar
	darcs record
	darcs tag

	cd ~/working/product
	darcs pull

While this is a big operation, (several hundred files changed),  
nothing I was working on changed in any significant way, and indeed  
there were just a few tiny conflicts.  Darcs was a little slow to do  
this, but acceptable given the size of the change, and the  
infrequency of this kind of thing.

I resolve the conflicts:

[7]	cd ~/working/product
	#edit files with 'v v v v' in them
	darcs record

Now, someone else discovers there is a bit of a mess in Branch_Bar,  
and fixes something.  This amounts to about 30 one line changes each  
of several files.  One of those files is one I've been working on,  
and there will be conflicts.  Indeed about 4 of those lines are in  
code I've deleted, and about 4 more are lines I've edited.  And this  
is one of the files fixed in step [7].  Remember, all my edits are  
recorded in the ~/working repo.  I'll need this change to get  
Branch_Bar compiling:

[5]	cd ~/base/product
	cvs update
	darcs record

	cd ~/working/product
	darcs pull

This never completes.  Never, nada, not ever.  I'm expecting some  
conflicts in my working area.  I'm expecting I'll have to go in a  
fiddle with a few conflict markers and re-apply those one line  
changes myself.  But it never gets there.

I try about three or four different ways to make this happen: What if  
I branch the base repo no and pull the working patches into it?  What  
if I do partial repos?  Perhaps the Mac port isn't up to the task, so  
I try it on a fast Windows box and on a fast Linux box.  Nothing works.

In the end I resort to doing this:

[6]	cd ~/working
	darcs unpull --patch 'patch-recorded-in-7'

	mkdir ~/old-base
	cd ~/old-base
	darcs get --to-patch 'last-patch-pulled-in-5b' ~/base/product

	diff -r -u ~/old-base/product ~/working/product > ~/work-patch.diff
	#details omitted here in making sure the diff ignores the _darcs dir

	mkdir ~/working2
	cd ~/working2
	darcs get ~/base/product

	cd ~/working2/product
	patch < ~/work-patch.diff

	#manually compare diffs:
	darcs whatsnew
	opendiff _darcs/current .

	#edit and fix as needed

	darcs record

Of course, I loose all my history of my patches that get to my  
current work state.  And if there are any branches of the ~/working  
repo, they must be abandoned and I must "darcs get" the ~/working2  
repo.  In my particular case, neither of these is too bad: The  
branches are just copies on other machines so I can check my work on  
other architectures and build tools (the code must run on three.),  
they don't contain code that hasn't already been propagated to ~/ 
working, and now patched into ~/working2.

(I have omitted here that I wanted to and was able to pull some  
particular patches from ~/working to ~/working2 that I really wanted  
to keep distinct.  It did, of course, complicate the sequence: Once I  
established that those patches wouldn't send darcs into a tailspin, I  
had to both pull them into ~/old-base and ~/working2, and *then*  
generate the patch file.)

=== Questions on the Misery ===

Is this the correct procedure to recover from a pull that darcs just  
can't get its head around?
Was there something I could have done to convince darcs that it  
really wasn't all that hard?
Was there something I could have done that would have made this  
operation go faster?
Is it possible that frequently touched files like makefiles and  
project files exacerbated this?  Should those changes be in patches  
by themselves so just those changes can be abandoned since they are  
easy to recreate by hand?
Lastly, why (I know, this is rhetorical), why can't darcs handle such  
a simple conflict?

=== Final thoughts ===

I'm a real fan of darcs and I use it for all my personal projects and  
some open source ones. This is the first truly large tree I've ever  
used it on.

At work, they all realize that CVS is at its limits for us... but are  
not sure what to go to next.  If folks at work can try darcs out for  
their own work, like my set up above, it would be a safe way for them  
to explore if darcs is for them.  However, if they hit a snag like I  
did today, they (and I) are going to need a pattern for what to do  
and how to recover.  That's what I'd like to write up.

Thanks for reading this far!

	- Mark

Mark Lentczner
http://www.ozonehouse.com/mark/
markl at glyphic.com