[darcs-users] breaking hashed files into multiple directories

Dan Pascu dan at ag-projects.com
Mon Sep 1 23:36:06 UTC 2008


On Tuesday 02 September 2008, zooko wrote:
> On Sep 1, 2008, at 16:23 PM, David Roundy wrote:
> > I suppose this would be a good entry-level project for someone, doing
> > the standard create 256 or so subdirectories based on the first
> > couple (or more?) hex characters in the hash, and then distribute the
> > files into those directories.
>
> If somebody out there wants a trickier and more innovative project
> that is less likely to succeed, you could see if a compressing,
> archiving tool would serve the same purpose as a filesystem directory
> (or nested set of directories) while also offering space and maybe
> even time advantages.
>
> The idea is, make a single compressed archive file which contains a
> bunch of "files" inside of it, where each file is (just like in the
> current hashed format), a patch whose filename is a hash and whose
> contents is the patch.
>
> If the compressing archive tool that you use can efficient extract
> individual files from the archive, then this might be efficient
> enough, or even more efficient, in speed.

If you want to go this path, why not use a database to store the data? 
Berkeley DB or SQLite should do just fine and they are most likely much 
faster in accessing the data that any archive, while giving you the same 
single file storage advantage (well maybe 3: pristine, patches and 
inventories). It won't give you the compression advantage (though you 
could store compressed data in them if that is really desired), but that 
is less of an issue I believe, as disk space is cheap and some stuff is 
already compressed (patches). Such a solution will definitely avoid the 
limited number of files per directory issue and can even offer the 
benefits of a hashed repository (no direct access to pristine to 
accidentally modify files), but without the need to hash the files, since 
they can be stored verbatim in the database. This would also make it 
slightly faster as the need to hash the files will dissapear.

-- 
Dan


More information about the darcs-users mailing list