[darcs-users] breaking hashed files into multiple directories

David Roundy daveroundy at gmail.com
Mon Sep 1 22:23:18 UTC 2008


On Mon, Sep 1, 2008 at 1:25 PM, Dan Pascu <dan at ag-projects.com> wrote:
> On Monday 01 September 2008, Petr Rockai wrote:
>> I have been thinking about that approach before, but have discarded it
>> on the paranoia argument. It still might be worthwhile: the downside
>> is, that repair would pollute the pristine.hashed with a very large
>> number of stale files: basically, any intermediate repository states
>> that we write out, which aren't that few. So we really want to call
>> clean_hashed every now and then while running repair (I think ext3
>> might have issues with very large directories, too, and also for space
>> reasons...).
>
> ext3 cannot handle more than 32768 files in a directory, or more than
> 32768 hard links to a file. A directory with many files also becomes
> slow. I guess the 32K limit poses a problem to the patches directory as
> well if the project gets to have a lot of recorded changesets.

Hmmmm.  I debated at one point about splitting hashes up into
different directories, and just never got around to it (and wish I
had).  It's a bit tedious, because it needs to be staged, so that
older versions of darcs will be able to read the newer format.  :(

I suppose this would be a good entry-level project for someone, doing
the standard create 256 or so subdirectories based on the first couple
(or more?) hex characters in the hash, and then distribute the files
into those directories.  It'd require staging (as I said), in that
we'd want to add the reading code first, and only later activate the
writing code--and probably with a repository format to control which
is used.  It's tedious, but the code is quite isolated, so it
shouldn't be hard for a newbie.

David


More information about the darcs-users mailing list