[darcs-users] breaking hashed files into multiple directories
Petr Rockai
me at mornfall.net
Tue Sep 2 00:00:08 UTC 2008
Hi,
Dan Pascu <dan at ag-projects.com> writes:
> If you want to go this path, why not use a database to store the data?
> Berkeley DB or SQLite should do just fine and they are most likely much
> faster in accessing the data that any archive, while giving you the same
> single file storage advantage (well maybe 3: pristine, patches and
> inventories). It won't give you the compression advantage (though you
> could store compressed data in them if that is really desired), but that
> is less of an issue I believe, as disk space is cheap and some stuff is
> already compressed (patches). Such a solution will definitely avoid the
> limited number of files per directory issue and can even offer the
> benefits of a hashed repository (no direct access to pristine to
> accidentally modify files), but without the need to hash the files, since
> they can be stored verbatim in the database. This would also make it
> slightly faster as the need to hash the files will dissapear.
it should also be noted, that it would make http repository access next to
impossible and other remote access pretty tricky. Also, hashing the files gives
us much better consistency and robustness guarantees than any of the mentioned
"database" engines ever pretended.
(As a sidenote, I'd probably say that compressing in a biggish, indexed and
compressed file is likely to give radically better compression ratio than
compressing individual patches. However, it would reduce any hardlink-sharing
possibilities to virtually zero, so the actual space advantage is very
dubious. A compromise solution, like the one GIT adopts, where "oldish" data is
packed into bigger chunks and distributed as "packs" (of which there are often
several) might actually bring advantages of both. It is however not clear at
all to me, how to associate hashes to pack files -- probably through an
external index that could be fetched independently by http, and used to
determine if the pack is needed or useful; it would complicate the cache code a
fair bit I guess, and make the lazy repositories a little more tricky overall,
but could give impressive speedups for initial, non-cached "get" over
http... Or might not, depending on how well pipelining actually works in
practice.)
Yours,
Petr.
--
Peter Rockai | me()mornfall!net | prockai()redhat!com
http://blog.mornfall.net | http://web.mornfall.net
"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
-- Blair P. Houghton on the subject of C program indentation
More information about the darcs-users
mailing list