[darcs-users] Is darcs optimize --compress still useful?
Trent W. Buck
trentbuck at gmail.com
Tue Mar 24 00:37:11 UTC 2009
On Mon, Mar 23, 2009 at 10:55:01AM +0100, Petr Rockai wrote:
> trentbuck at gmail.com (Trent W. Buck) writes:
> > Petr Rockai <me at mornfall.net> writes:
> >> Not really. The difference is that uncompressed patches can be mmaped,
> >> and are therefore more efficient than compressed (with saved space
> >> being on the edge of measurability, since quite many compressed
> >> patches take the same number of filesystem blocks as their
> >> uncompressed counterparts...).
> >
> > So if I've understood correctly, we have *three* options (remove
> > compression, put it back again, and dither) in order to support
> > uncompressed patches -- a feature which
> >
> > - increases disk consumption considerably;
> See above: the compressed patches save very little actual space. You need to
> cross a filesystem block boundary with your compression to be of any use.
Sorry, I misread your initial comment. So supposing a block-oriented
filesystem with 512-byte blocks (quite small), you only save space if
compression reduces the number of blocks by at least one, e.g. from a
1024kiB uncompressed patch file to an 1023.5kiB compressed patch file.
As a case study, let's examine a --complete checkout of Darcs' repo,
both compressed and uncompressed.
$ find _darcs -type f \( -size 0b -o -size 1b \) | wc -l
5093
$ find _darcs -type f -not -size 0b -not -size 1b | wc -l
2728
With gzip compression, approximately one third of all patches are more
than one 512-byte block.
$ darcs optimize --uncompress
$ find _darcs -type f \( -size 0b -o -size 1b \) | wc -l
3026
$ find _darcs -type f -not -size 0b -not -size 1b | wc -l
4795
So the number of files that benefit by at least one block from
compression is around one quarter.
By default, ext3 uses 4096-byte (not 512-byte) blocks, so the above
findings are skewed AGAINST decompression. They also only apply to
block-oriented filesystems.
> > - decreases RAM consumption negligibly; and
> > - decreases CPU time negliglibly.
> Well, it probably does help with both probably as much as compression helps
> with disk space usage. So I might be actually in favour of removing compression
> for patches.
So compression is (significantly) useful when and only when we get packs?
http://wiki.darcs.net/index.html/PacksSpecification
I notice that page strongly references *your* work on hashed-storage,
which is also on the agenda for this month's sprint.
> [...] unless you maybe use one of those modern filesystems).
Do you mean non-block-oriented filesystems? Which are they, stuff
like XFS, ZFS and btrFS?
More information about the darcs-users
mailing list