[darcs-users] breaking hashed files into multiple directories
Jason Dagit
dagit at codersbase.com
Tue Sep 2 00:02:47 UTC 2008
On Mon, Sep 1, 2008 at 5:00 PM, Petr Rockai <me at mornfall.net> wrote:
> Hi,
>
> Dan Pascu <dan at ag-projects.com> writes:
>> If you want to go this path, why not use a database to store the data?
>> Berkeley DB or SQLite should do just fine and they are most likely much
>> faster in accessing the data that any archive, while giving you the same
>> single file storage advantage (well maybe 3: pristine, patches and
>> inventories). It won't give you the compression advantage (though you
>> could store compressed data in them if that is really desired), but that
>> is less of an issue I believe, as disk space is cheap and some stuff is
>> already compressed (patches). Such a solution will definitely avoid the
>> limited number of files per directory issue and can even offer the
>> benefits of a hashed repository (no direct access to pristine to
>> accidentally modify files), but without the need to hash the files, since
>> they can be stored verbatim in the database. This would also make it
>> slightly faster as the need to hash the files will dissapear.
> it should also be noted, that it would make http repository access next to
> impossible and other remote access pretty tricky. Also, hashing the files gives
> us much better consistency and robustness guarantees than any of the mentioned
> "database" engines ever pretended.
>
> (As a sidenote, I'd probably say that compressing in a biggish, indexed and
> compressed file is likely to give radically better compression ratio than
> compressing individual patches. However, it would reduce any hardlink-sharing
> possibilities to virtually zero, so the actual space advantage is very
> dubious. A compromise solution, like the one GIT adopts, where "oldish" data is
> packed into bigger chunks and distributed as "packs" (of which there are often
> several) might actually bring advantages of both. It is however not clear at
> all to me, how to associate hashes to pack files -- probably through an
> external index that could be fetched independently by http, and used to
> determine if the pack is needed or useful; it would complicate the cache code a
> fair bit I guess, and make the lazy repositories a little more tricky overall,
> but could give impressive speedups for initial, non-cached "get" over
> http... Or might not, depending on how well pipelining actually works in
> practice.)
Darcs tags might correspond rather well to packs.
Jason
More information about the darcs-users
mailing list