[darcs-users] Is darcs optimize --compress still useful?

Trent W. Buck trentbuck at gmail.com
Tue Mar 24 00:37:11 UTC 2009

On Mon, Mar 23, 2009 at 10:55:01AM +0100, Petr Rockai wrote:
> trentbuck at gmail.com (Trent W. Buck) writes:
> > Petr Rockai <me at mornfall.net> writes:
> >> Not really. The difference is that uncompressed patches can be mmaped,
> >> and are therefore more efficient than compressed (with saved space
> >> being on the edge of measurability, since quite many compressed
> >> patches take the same number of filesystem blocks as their
> >> uncompressed counterparts...).
> >
> > So if I've understood correctly, we have *three* options (remove
> > compression, put it back again, and dither) in order to support
> > uncompressed patches -- a feature which
> >
> >   - increases disk consumption considerably;
> See above: the compressed patches save very little actual space. You need to
> cross a filesystem block boundary with your compression to be of any use.

Sorry, I misread your initial comment.  So supposing a block-oriented
filesystem with 512-byte blocks (quite small), you only save space if
compression reduces the number of blocks by at least one, e.g. from a
1024kiB uncompressed patch file to an 1023.5kiB compressed patch file.

As a case study, let's examine a --complete checkout of Darcs' repo,
both compressed and uncompressed.

    $ find _darcs -type f \( -size 0b -o -size 1b \) | wc -l
    $ find _darcs -type f -not -size 0b -not -size 1b | wc -l

With gzip compression, approximately one third of all patches are more
than one 512-byte block.

    $ darcs optimize --uncompress
    $ find _darcs -type f \( -size 0b -o -size 1b \) | wc -l
    $ find _darcs -type f -not -size 0b -not -size 1b | wc -l

So the number of files that benefit by at least one block from
compression is around one quarter.

By default, ext3 uses 4096-byte (not 512-byte) blocks, so the above
findings are skewed AGAINST decompression.  They also only apply to
block-oriented filesystems.

> >   - decreases RAM consumption negligibly; and
> >   - decreases CPU time negliglibly.
> Well, it probably does help with both probably as much as compression helps
> with disk space usage. So I might be actually in favour of removing compression
> for patches.

So compression is (significantly) useful when and only when we get packs?


I notice that page strongly references *your* work on hashed-storage,
which is also on the agenda for this month's sprint.

> [...] unless you maybe use one of those modern filesystems).

Do you mean non-block-oriented filesystems?  Which are they, stuff
like XFS, ZFS and btrFS?

More information about the darcs-users mailing list