[darcs-users] Benchmarking "get"
Stephen J. Turnbull
stephen at xemacs.org
Wed Mar 18 01:23:28 UTC 2009
Ian Lynagh writes:
> 275656 camp
> 172464 darcs1
> 331924 darcs1un
> 172464 darcs1darcs2
> 331976 darcs1darcs2un
> 160476 darcs1hashed
> 326780 darcs1hashedun
> 162020 darcs2
> 330000 darcs2un
> 431632 git
>
> So all the darcs compressed and uncompressed repos are about the same
> size as each other. camp (uncompressed) sits in the middle. git is
> surprisingly (to me) large, but I don't know what it stores.
FYI:
Until you do a "pack" (implied by git-gc), git stores a
zlib-compressed copy of each version of each file, plus a
zlib-compressed copy of each git tree (~ Unix directory), plus a
zlib-compressed copy of echo commit. Not only is that a lot of
redundancy, but on "classic" block-oriented filesystems like ext2,
those trees and commits typically take up about 1% of a 16KiB disk
block, and for typical C projects, the amount of block wastage for
file storage is typically quite high.
The pack operation takes all the objects and does a (somewhat
optimized) delta compression exercise, then puts them into archives
called packs, with separate index files. ("Optimized" means that
deltas are taken against most similar versions, not necessarily
parents, that no version of a file is "too many deltas away" from a
full text version, and some others that have to do with efficient
access -- net result is somewhat better than gzipping a CVS ,v file.)
Packing not only makes a big difference to storage space, it also
makes a huge difference to remote cloning speed (the git: protocol
automatically packs before transmitting, even repacking an already
packed remote if it's gotten stale).
More information about the darcs-users
mailing list