[darcs-users] Hashed GHC (and dependencies) repositories

Matthias Kilian kili at outback.escape.de
Thu Jun 11 06:26:23 UTC 2009

On Wed, Jun 10, 2009 at 12:23:34PM +0200, Matthias Kilian wrote:
> > Any chance we could have a timing test for the old fashioned repo on the same
> > server just to rule out confounding factors?  So we'd be comparing
> > old-fashioned d.h.org  with old-fashioned d.v.de with hashed d.v.de...
> The old-fashioned versions are at http://darcs.volkswurst.de/unhashed.
> I'm running some benchmarks now. And I'm explicitely using --no-cache,
> the super-fast results I noticed came from my cache ;-)
> I'll send the results later this day.

Sorry for the delay, here are some first results (fetching ghc,
ghc-6.10 and dependencies via http):

darcs get --no-cache --complete from the hashed repo:
  199m5.50s real     4m6.96s user     5m40.76s system

darcs get --no-cache --complete from the unhashed repo:
   93m38.54s real     8m13.50s user     5m4.94s system

darcs get --no-cache --lazy from the hashed repo:
   33m42.11s real     0m38.86s user     0m48.42s system

darcs get --no-cache --partial from the unhashed repo:
   87m48.18s real     6m51.21s user     2m28.39s system

That means that an initial darcs get from a hashed repository is
actually much *slower* than from an old-fashioned repository (but
only if you're using darcs get --complete). That's a little bit
surprising, since on the remote site, the _darcs directory of the
hashed repository is actually smaller, and _darcs/patches have
identical size:

On the remote repository::

$ du -hs {,unhashed/}ghc/_darcs        
107M    ghc/_darcs
120M    unhashed/ghc/_darcs

$ du -hs {,unhashed/}ghc/_darcs/patches
94.2M   ghc/_darcs/patches
94.2M   unhashed/ghc/_darcs/patches

Anyway, lazy getting is faster, and with caching enabled (and $HOME
on a local disk), subsequent gets are of course much faster (the
complete set takes between seven and nine minutes). I'll try to
repeat the hashed tests with an NFS-mounted ~/.darcs/cache.

Does anyone suggest more benchmarking? Some special actions that
cause pain on old-fashioend repositories and that should be faster
on hashed repositories?


ps: http://darcs.volkswurst.de/bin/fetchrepos is the script I used
for the tests, with modified values for GET and BASE.

More information about the darcs-users mailing list