[darcs-users] conflict misery....
Mark Lentczner
markl at glyphic.com
Sat Dec 31 05:40:49 UTC 2005
Okay, I think this has been hashed out before, I think even in a
thread started by me long ago.... But alas it never made it into the
Wiki or other permanent store... So here I go again:
I'm going to describe my current usage model and a "darcs spins
forever" disaster that it took me all day to deal with. I'm looking
for a) comments on the usage model, and b) ideas on how to best
handle the spinning situation. I promise, promise, promise that this
time I'll write both of these things up and put them in the wiki.
I apologize for this being so long, but I want to be very clear about
what is going on.
=== The Set Up ===
At work, my team uses CVS to manage our source. Not counting 3rd
party libraries checked into the tree, there is about 23 Meg of
source in 2000 files, and another 150 Meg of stuff in a 1000 files.
For various reasons (like really fast branching), I want to use darcs
with very frequent checkins for work on my cluster of machines, and
commit completed work back to CVS every day or so. Disk space isn't
an issue (my machines have tons), but speed is. I want my source
operations to be fast. I like to record often, branch often, and
find out what currently changed often. These operations are fast in
darcs and miserably slow in CVS (especially if there are branches.)
I start by checking out some branch of the CVS tree into a base:
[1] mkdir ~/base
cd ~/base
cvs ... checkout -r Branch_Foo product
Then I build a darcs repo on this
[2] cd ~/base/product
darcs init
darcs add -r .
darcs record
Then I get that into a working area
[3] mkdir ~/working
cd ~/working
darcs get ~/base/product
Now I can work along for awhile:
[4] cd ~/working/product
repeat {
#edit files
darcs record
}
When I need to update my working area to the latest version of the
CVS branch (before I can commit my changes to CVS) I do this:
[5] cd ~/base/product
cvs update
darcs record
cd ~/working/product
darcs pull
This generally works just fine - even if I commit the sin of not
recording some outstanding changes in working first.
When my work is ready for committing back to CVS, I do:
[6] cd ~/base/product
darcs pull ~/working/product
cvs commit
Occasionally, I'll add a darcs tag in product when what I've pulled
from CVS is an important release:
[5a] cd ~/base/product
cvs update
darcs record
darcs tag
cd ~/working/product
darcs pull
=== Questions on the set up ===
Does this seem rational?
Are there some good tricks for making this speedy and fast?
Should I be adding 'darcs tag --checkpoint' and use 'darcs get --
partial'?
Should I be doing 'darcs optimize' at some point?
=== The Conflict Misery ===
Yesterday, there is a big shift in which branch will be the one for
current development. But, no problem-o I say: This is just a
variant of operation [5] above:
[5b] cd ~/base/product
cvs update -r Branch_Bar
darcs record
darcs tag
cd ~/working/product
darcs pull
While this is a big operation, (several hundred files changed),
nothing I was working on changed in any significant way, and indeed
there were just a few tiny conflicts. Darcs was a little slow to do
this, but acceptable given the size of the change, and the
infrequency of this kind of thing.
I resolve the conflicts:
[7] cd ~/working/product
#edit files with 'v v v v' in them
darcs record
Now, someone else discovers there is a bit of a mess in Branch_Bar,
and fixes something. This amounts to about 30 one line changes each
of several files. One of those files is one I've been working on,
and there will be conflicts. Indeed about 4 of those lines are in
code I've deleted, and about 4 more are lines I've edited. And this
is one of the files fixed in step [7]. Remember, all my edits are
recorded in the ~/working repo. I'll need this change to get
Branch_Bar compiling:
[5] cd ~/base/product
cvs update
darcs record
cd ~/working/product
darcs pull
This never completes. Never, nada, not ever. I'm expecting some
conflicts in my working area. I'm expecting I'll have to go in a
fiddle with a few conflict markers and re-apply those one line
changes myself. But it never gets there.
I try about three or four different ways to make this happen: What if
I branch the base repo no and pull the working patches into it? What
if I do partial repos? Perhaps the Mac port isn't up to the task, so
I try it on a fast Windows box and on a fast Linux box. Nothing works.
In the end I resort to doing this:
[6] cd ~/working
darcs unpull --patch 'patch-recorded-in-7'
mkdir ~/old-base
cd ~/old-base
darcs get --to-patch 'last-patch-pulled-in-5b' ~/base/product
diff -r -u ~/old-base/product ~/working/product > ~/work-patch.diff
#details omitted here in making sure the diff ignores the _darcs dir
mkdir ~/working2
cd ~/working2
darcs get ~/base/product
cd ~/working2/product
patch < ~/work-patch.diff
#manually compare diffs:
darcs whatsnew
opendiff _darcs/current .
#edit and fix as needed
darcs record
Of course, I loose all my history of my patches that get to my
current work state. And if there are any branches of the ~/working
repo, they must be abandoned and I must "darcs get" the ~/working2
repo. In my particular case, neither of these is too bad: The
branches are just copies on other machines so I can check my work on
other architectures and build tools (the code must run on three.),
they don't contain code that hasn't already been propagated to ~/
working, and now patched into ~/working2.
(I have omitted here that I wanted to and was able to pull some
particular patches from ~/working to ~/working2 that I really wanted
to keep distinct. It did, of course, complicate the sequence: Once I
established that those patches wouldn't send darcs into a tailspin, I
had to both pull them into ~/old-base and ~/working2, and *then*
generate the patch file.)
=== Questions on the Misery ===
Is this the correct procedure to recover from a pull that darcs just
can't get its head around?
Was there something I could have done to convince darcs that it
really wasn't all that hard?
Was there something I could have done that would have made this
operation go faster?
Is it possible that frequently touched files like makefiles and
project files exacerbated this? Should those changes be in patches
by themselves so just those changes can be abandoned since they are
easy to recreate by hand?
Lastly, why (I know, this is rhetorical), why can't darcs handle such
a simple conflict?
=== Final thoughts ===
I'm a real fan of darcs and I use it for all my personal projects and
some open source ones. This is the first truly large tree I've ever
used it on.
At work, they all realize that CVS is at its limits for us... but are
not sure what to go to next. If folks at work can try darcs out for
their own work, like my set up above, it would be a safe way for them
to explore if darcs is for them. However, if they hit a snag like I
did today, they (and I) are going to need a pattern for what to do
and how to recover. That's what I'd like to write up.
Thanks for reading this far!
- Mark
Mark Lentczner
http://www.ozonehouse.com/mark/
markl at glyphic.com
More information about the darcs-users
mailing list