[darcs-users] Benchmarking "get"

Stephen J. Turnbull stephen at xemacs.org
Wed Mar 18 01:23:28 UTC 2009

Ian Lynagh writes:

 > 275656  camp
 > 172464  darcs1
 > 331924  darcs1un
 > 172464  darcs1darcs2
 > 331976  darcs1darcs2un
 > 160476  darcs1hashed
 > 326780  darcs1hashedun
 > 162020  darcs2
 > 330000  darcs2un
 > 431632  git
 > So all the darcs compressed and uncompressed repos are about the same
 > size as each other. camp (uncompressed) sits in the middle. git is
 > surprisingly (to me) large, but I don't know what it stores.


Until you do a "pack" (implied by git-gc), git stores a
zlib-compressed copy of each version of each file, plus a
zlib-compressed copy of each git tree (~ Unix directory), plus a
zlib-compressed copy of echo commit.  Not only is that a lot of
redundancy, but on "classic" block-oriented filesystems like ext2,
those trees and commits typically take up about 1% of a 16KiB disk
block, and for typical C projects, the amount of block wastage for
file storage is typically quite high.

The pack operation takes all the objects and does a (somewhat
optimized) delta compression exercise, then puts them into archives
called packs, with separate index files.  ("Optimized" means that
deltas are taken against most similar versions, not necessarily
parents, that no version of a file is "too many deltas away" from a
full text version, and some others that have to do with efficient
access -- net result is somewhat better than gzipping a CVS ,v file.)

Packing not only makes a big difference to storage space, it also
makes a huge difference to remote cloning speed (the git: protocol
automatically packs before transmitting, even repacking an already
packed remote if it's gotten stale).

More information about the darcs-users mailing list