[darcs-devel] Compressing Patches, LZO, and that .gz.

David Roundy droundy at darcs.net
Sat Apr 16 04:47:04 PDT 2005


On Sat, Apr 16, 2005 at 12:44:46PM +0100, Ralph Corderoy wrote:
> So a 65% CPU time saving is at the expense of 25% growth in compressed
> file.  Obviously it needs trying with the actual compression rate darcs
> uses but it suggests it's worth investigating.  Given that darcs is CPU
> and memory bound rather than I/O bound I'd suggest it's worth it.

But keep in mind that although darcs is CPU-bound, it isn't
compression-bound, the compression takes only a very small fraction of
darcs' time.  Git also uses zlib compression for all its files, and git is
astoundingly fast.

> On a related note, the `.gz' suffix of patch files should go given that it
> doesn't indicate they're compressed.  I'd suggest that regardless of
> whether the patch is uncompressed, gzip'd, lzop'd, or something else, it
> has no suffix.  This allows the patch to be accessed without having to
> search the directory to see what suffix it may have, or attempting
> several opens.  It should be possible to determine the format by the
> first few bytes.  A darcs command, perhaps under query, could be a
> substitute for cat(1).  Yes, I know it's a respository change but that
> seems better than leaving a .gz around that zgrep, etc., fail on.

I agree that leaving the .gz out would have been a good idea, but I don't
think that changing now would be a good idea, since for backwards
compatibility we'd have to try both filenames, and that would be a pain.  I
think this is a transition we can make when we switch to a hashed
inventory, and storing patches according to a hash of the patch contents,
since at that time we'll have to deal with a repository format transition
anyways.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-devel mailing list