[darcs-users] Bitkeeper and Eclipse questions

Sun Mar 20 17:59:04 UTC 2005

Juliusz Chroboczek wrote:
>>bk wins for having nice merge tools, revision history tools and
>>trigger / script support. In general bk has nice tools (gui or not)
>>for getting at a lot of information.
> 
> 
> Sean,
> 
> Could you please expand on that (either here or on the Wiki)?  As you
> are surely aware, the Darcs developers cannot legally play with the
> free version of BK, and hence we might not know what's missing.
> 

Even if you can not use bk, you can read their docs. (-:

Let me give an idea of what my work scenario was like. Then we can 
discuss what this can mean for darcs.

Background. The repos have revision history from now to circa 2001 in 
bk. The section I dealt with most was a C++ based control daemon 
weighing in around 20 - 50 kloc. 20+ programmers had come and gone, most 
had pushed changes to this repo (let's call it MAIN). A *LOT* of churn 
had occured. People had added entire source directories from other 
projects to MAIN and then found out it was a bad idea or it was no 
longer used. For instance, someone added a GPL library so it had to be 
removed. A clone of MAIN was several gigs in size. Of course, it also 
contained the entire 7.2 Postgres source tree as well, shrug.

We had a scripted system which:
* prevents pushes to the tree unless your user name is in a permissions 
file. This is used when the product hits alpha to control who changes 
the tree.
* if you push to version 1.0 of the tree, your change would propagate to 
1.1, 1.5, 2.0, etc. as defined by a propagation file. If the merge 
breaks, an email is sent out and the person who pushed the code along 
with the tree maintainer are in charge of triage.
* automatically send out change notification emails
* automatically schedule a fresh checkout for the build system and 
notifed the build system that a new checkout was in the queue. 100% 
automated queue, build and test system. Worked well. Unfortunately there 
were no unit tests, this was all black box testing.
* read changelog messages and update bugzilla if there was a Fixes: #XXX 
entry.
* read changelog and would not allow a commit without a "ReviewedBy" field.

I feel like I am missing something, but this should give you the idea.
Scripting is fundamental to automation. Without the automation the 
product would not have made it anywhere near as far as it has.
Notifications via email, irc bots, IM, etc. are also key. We had 
developers in different states here in the US as well as in India. So it 
was not as easy as walking over to someone's cube to find out information.

Because of the large amount of history, deleted files, etc. clones are 
slow and large. This is where a distributed system shows it weakness. bk 
until recently did not have a good way to cut off this detritus. Even 
with the BitMover's help in cleaning up the tree, handling the merge 
from versionA -> versionB -> versionC is going to take work so it keeps 
getting pushed further away. Disk may be cheap, but not when you have 
20+ people who average 40+GB home directories. The home directories were 
managed on a SAN but adding new space is not easy or cheap when you 
reach the terabyte level. "Cheap" is also relative. When a company 
refuses to spend money in a particular quarter on infrastructure nothing 
is cheap enough.

Revision history. Ok, bk uses ugly, 1995 style tk apps. However, they 
work and work well. bk revtool launches a GUI browser which shows you 
the entire repo's history as multiple timelines, so branches, multiple 
users, etc. are all represented. You can ask for the difference between 
node N and node M. You can ask for revision X.Y.Z. When looking at an 
individual file, you get something analogous to what Mr. Schwern is 
asking for on the annotate thread. Really handy during a bug hunt.

Merging. bk did a really good job of dealing with conflicts. When an 
issue arose, bk mergetool was a real help. For each file that a conflict 
occured in you received a choice:
keep the local version
take the new version
run diff
exit to shell
perform a 2 way merge using the GUI difftool
perform a 3 way merge using the GUI difftool
(one or two others I am forgetting)

The gui tools let you cherry pick diffs from both files as well as hand 
edit the resulting file. Their UI was obviously made by a coder and took 
quite a bit of learning. But once you became use to them they were very 
useful.

Now there is once place where bk did a horrible job at merges. If I 
added a function to the bottom of a file and someone else added a 
function to the bottom of the same file bk would realize that the 
changes could stack:
A
B

or even
B
A

so it caused a merge conflict which 99.9% of the time involved you 
saying "take my change, then take their change, commit".

All is not roses with bk. But it is the only system I have seen keep up 
with the pain we put it through. subversion looks like it is getting 
close but the binary db aspect is worrying. If things go really, really 
wrong bk uses SCCS and you can find a way to fix it. Last I heard the 
svn people were developing a simple, file based system.

The GUIs are ugly and look worse on Windows. *BUT* they do run on 
Windows, Unix with the X Window System, and Mac OS X.

Workflow. You have to: bk edit, <make changes>, bk checkin, bk commit, 
bk push. The bk edit part makes applying patches a bit of a pain. 
However, bk unedit returns the file to the state it was before you ran 
bk edit so it made quick testing of ideas easy and safe. I suppose you 
could do the same thing in darcs with undo / unrecord, etc. checkin 
takes a changelog message. commit wraps up individual changes into a 
changeset and adds a changelog message to the changeset. While handy, 
you have to always remember that bk wants changesets, not patches and 
not file revisions.

bk also makes it hard to generate a patch once you have checked in a 
revision (analogue of darcs' record). Until the 3.0 series generating a 
patch after commit was REALLY a daunting task because you had to tell bk 
which file revision to start from. Finding the right revision number is 
often not a trivial task.

Summary:
bk is robust and dependable. It handles really, really big repos. The 
GUIs work and really make certain jobs easy. Scripting and automation 
are vital in groups bigger than a few people especially with distributed 
programmers.

darcs *MUST* have a way to clean out the revision tree. There should be 
a way to get rid of 4 year old garbage. I am not saying darcs can not do 
it today, maybe it can. If it can, this ability should be very clearly 
spelled out in the docs.

bk's GUIs are possible because some of the functionality lives in 
libraries that the CLI and GUIs share. Simply wrapping shell programs in 
GUIs is a recipe for pain, ask the arch people. I believe this means 
that either we move some of the darcs core to a C library or all of the 
GUI work will need to occur in Haskell perhaps using wxHaskell.

I hope this helps. If I have not sufficiently explained something, 
please let me know.