[darcs-devel] Filesystem in DB, but data in filesystem

David Roundy droundy at darcs.net
Fri Aug 3 18:05:33 PDT 2007


On Tue, Jul 31, 2007 at 08:11:18AM +0200, Salvatore Insalaco wrote:
> - Everybody agree that a "file index" solution, without using
> relational db, is better.
> - We can use a plain text storage for file index, if there aren't
> excessive performance problems.
> - We would like to compute a checksum of the file (we are going to
> read it all anyway, and it will be in FS cache after the first read).
> - We like made-up filenames, so we can help the user to not
> accidentally modify them.
> - We like made-up filenames, so we can prevent the case-insensitivity
> and not allowed filename problem on the pristine cache.
> - We like to be fast on directory and file move, so the made-up
> filenames should be independent of the path position (so we just
> update the index).
> - We can continue to use the hard-link trick for local copy of repositories.
> - The system has to check for the three corruption cases (file in FS
> but not in index, file in index but on in FS, file different in file
> index and FS), complain for the last two and offer an "optimize"
> option for the first one.
> - Instead of "sequential" filenames, we could use filenames similar to
> the one in patches directory. It helps with remote copy, as there're
> no filename conflicts.
> - We could "partition" the files on multiple directories, with a hash
> algorithm, if there're too many (I think that most modern filesystems
> have no problems even with tens of thousands of files, but better
> check).
> 
> Anything else?

Here's something that should be pretty easy to do in combination with the
above, if you design it in: atomic updates.  Atomic updates are actually
pretty easy, since renameFile is atomic (except on Windows, where atomic
updates aren't possible, so far as I know), and all of darcs' file-writing
functions write a temp file first and then rename it to overwrite the
existing file.  So if you write the new index and all files it indexes in
the right order, the result is an atomic update, which will be very nice.

In particular, it'll mean won't need the "tentative_pristine" in order to
acheive atomic updates, which currently slows darcs optimize --reorder down
from 3.5 seconds to close to two minutes.  !!!
-- 
David Roundy
Department of Physics
Oregon State University


More information about the darcs-devel mailing list