[darcs-users] General questions.

Wed Apr 27 11:10:57 UTC 2005

> curious, how well does Darcs scale to large projects? Like, suppose I'm 
> really Linus Torvalds under a secret identity :) and I want to use Darcs 
> for the Linux kernel. Could Darcs handle the job?

No.  Here are a few data points.  Keep in mind that there are several
partially-independent axes of scalability: number of source lines of code
("SLOC"), size of files, number of active hackers, number of patches.

data point 1:

The darcs project itself uses darcs.  The stable branch has less than 2 MB
worth of code, in 200 files, and 19,000 SLOC.  There are almost 2000 patches.
There are around a dozen active hackers (plus or minus 50% precision).  Darcs
works very well for the darcs project.

data point 2:

I use darcs for a project which has 30 MB worth of code and binaries, in 2500
files (almost all of which are imported from 3rd party libraries and never get
touched by me) and 25,000 SLOC (excluding 3rd party libraries).  My project
has almost 600 patches in the current stable branch, although an earlier
branch had around a thousand patches and was folded into this stable branch.
There have been three active hackers for most of the project's life, one of
whom refuses to use darcs and uses SVN instead.  Darcs works acceptably well
for this project, but it requires frequent user intervention when it gets
stuck.

data point 3:

I've been trying to use darcs for another project which has 1.4 GB (including
large binary files and many .tar.gz copies of 3rd party libraries), 36,000
files, 130,000 SLOC (excluding 3rd party libraries), 700 patches (in the
current stable branch -- the actual project history stretches back many years
and tens of thousands of revisions), and five active hackers.  So far, I
haven't succeeded at importing the project history from SVN into darcs through
the tailor script, despite many hours of trying.  (Now, I could and in fact do
use darcs on this project without importing the entire history, and without
sharing darcs patches with the other hackers.  But this data point is still
relevant to the scalability question.)

data point 4 ???

The Linux kernel weighs 232 MB, in 18,000 files and 4,000,000 SLOC.  Its
history, after conversion from BK->CVS->darcs, comprises 21,000 patches, but 
I think the original BK history has several times that number of patches.  It
has hundreds of active hackers.

A few people have tried using darcs to manipulate the Linux kernel, including
myself (as an idle experiment) and Linus Torvalds (as an attempt to replace
BitKeeper), and as far as I am aware everyone who has tried it has quickly
given up.

> Another question: Could someone in the know offer a brief comparison 
> between Darcs and Monotone? This is my first time trying a modern SCM. 
> All I know is CVS, and I'm eager to find something better. So right now 
> I'm toying with Darcs and Monotone. I think I've narrowed my search down 
> to those two. But I'm having trouble comparing Darcs and Monotone.

If you try Codeville, please let me know what you think in comparison to Darcs
and Monotone.

> With Monotone you genrate an RSA key, and Darcs doesn't. Does that mean 
> that Monotone patches are signed and Darcs' aren't? Does this have an 
> effect in security?

It depends on what your security concerns are.  In general, darcs controls
access by the use of URLs and darcs commands.  For example, do you want to
pull all patches from the bkcvs darcs repo of Linux?  Then run "darcs pull -a
http://darcs.net/linux".  If you do not want to pull all those patches, then
do not run that command.  If you want to be sure that the repo you are pulling
from is really the darcs.net/linux repo, then use https (although actually
that functionality might be broken currently).  Monotone allows you to control
those same policies through keys and policy files instead of through URLs and
cmdline commands.  For example, if you want to accept patches from that repo,
then get its public key, add its public key to your local database, and edit
your configuration file to specify that all patches signed by that public key
should be allowed into your repo (although actually much of that functionality
is not yet implemented IIUC).  In general, I strongly prefer the overall
former approach to the latter.

By the way, there is a security problem in darcs currently in that multiple
patches could masquerade under the same patch Id.  If you are concerned about
security issues then you should learn about this and other issues and
construct appropriate workarounds.

Regards,

Zooko

note: SLOC data generated using David A. Wheeler's 'SLOCCount'.