[darcs-devel] a measurement of using 7z instead of gz to compress patches

zooko zooko at zooko.com
Fri Jan 18 20:03:03 UTC 2008


> Well, the tar version compresses much better than zip because it is
> streaming and doesn't have an index. since you tar then gzip, it  
> doesn't
> compress each patch independently but rather compresses a single  
> stream
> containing all the patches. since patches are extremely similar, this
> results in substantial improvements over individually compressing each
> file like zip does in order to individually access them via its index.

Really?  Perhaps you were talking about "zip" [1], but I am talking  
about "7z" a.k.a. "7-zip" [2].

I decided to measure this on my allmydata.org repository.  Hopefully  
the names are self-explanatory.  The first two are ones that are  
actually implemented by darcs at this time.

$ du -sk * # So the numbers are in KiB
81592	trunk-nocompress-individual-patches
48980	trunk-gz-individual-patches
44528	trunk-7z-individual-patches
18784	trunk-tar-gz-all-at-once
10644	trunk-tar-7z-all-at-once
10600	trunk-7z-all-at-once

So basically, it would be interesting if some Haskell hacker wanted  
to make 7z archives (or else 7z-compressed tarballs) available in  
through a Haskell interface.  Depending on your filesystem and your  
usage, it might well be that the result is faster as well as more  
compact (they key to being faster would be to avoid a seek() that  
wasn't already cached by your filesystem.  Obviously tighter  
compression can help with this.).

Regards,

Zooko

[1] http://en.wikipedia.org/wiki/ZIP_(file_format)
[2] http://en.wikipedia.org/wiki/7z


More information about the darcs-devel mailing list