[darcs-devel] a measurement of using 7z instead of gz to compress patches
zooko
zooko at zooko.com
Fri Jan 18 20:03:03 UTC 2008
> Well, the tar version compresses much better than zip because it is
> streaming and doesn't have an index. since you tar then gzip, it
> doesn't
> compress each patch independently but rather compresses a single
> stream
> containing all the patches. since patches are extremely similar, this
> results in substantial improvements over individually compressing each
> file like zip does in order to individually access them via its index.
Really? Perhaps you were talking about "zip" [1], but I am talking
about "7z" a.k.a. "7-zip" [2].
I decided to measure this on my allmydata.org repository. Hopefully
the names are self-explanatory. The first two are ones that are
actually implemented by darcs at this time.
$ du -sk * # So the numbers are in KiB
81592 trunk-nocompress-individual-patches
48980 trunk-gz-individual-patches
44528 trunk-7z-individual-patches
18784 trunk-tar-gz-all-at-once
10644 trunk-tar-7z-all-at-once
10600 trunk-7z-all-at-once
So basically, it would be interesting if some Haskell hacker wanted
to make 7z archives (or else 7z-compressed tarballs) available in
through a Haskell interface. Depending on your filesystem and your
usage, it might well be that the result is faster as well as more
compact (they key to being faster would be to avoid a seek() that
wasn't already cached by your filesystem. Obviously tighter
compression can help with this.).
Regards,
Zooko
[1] http://en.wikipedia.org/wiki/ZIP_(file_format)
[2] http://en.wikipedia.org/wiki/7z
More information about the darcs-devel
mailing list