[darcs-users] breaking hashed files into multiple directories
Dan Pascu
dan at ag-projects.com
Mon Sep 1 23:36:06 UTC 2008
On Tuesday 02 September 2008, zooko wrote:
> On Sep 1, 2008, at 16:23 PM, David Roundy wrote:
> > I suppose this would be a good entry-level project for someone, doing
> > the standard create 256 or so subdirectories based on the first
> > couple (or more?) hex characters in the hash, and then distribute the
> > files into those directories.
>
> If somebody out there wants a trickier and more innovative project
> that is less likely to succeed, you could see if a compressing,
> archiving tool would serve the same purpose as a filesystem directory
> (or nested set of directories) while also offering space and maybe
> even time advantages.
>
> The idea is, make a single compressed archive file which contains a
> bunch of "files" inside of it, where each file is (just like in the
> current hashed format), a patch whose filename is a hash and whose
> contents is the patch.
>
> If the compressing archive tool that you use can efficient extract
> individual files from the archive, then this might be efficient
> enough, or even more efficient, in speed.
If you want to go this path, why not use a database to store the data?
Berkeley DB or SQLite should do just fine and they are most likely much
faster in accessing the data that any archive, while giving you the same
single file storage advantage (well maybe 3: pristine, patches and
inventories). It won't give you the compression advantage (though you
could store compressed data in them if that is really desired), but that
is less of an issue I believe, as disk space is cheap and some stuff is
already compressed (patches). Such a solution will definitely avoid the
limited number of files per directory issue and can even offer the
benefits of a hashed repository (no direct access to pristine to
accidentally modify files), but without the need to hash the files, since
they can be stored verbatim in the database. This would also make it
slightly faster as the need to hash the files will dissapear.
--
Dan
More information about the darcs-users
mailing list