[darcs-devel] Filesystem in DB, but data in filesystem

Salvatore Insalaco kirby81 at gmail.com
Mon Jul 30 23:11:18 PDT 2007


2007/7/30, Eric Y. Kow <eric.kow at gmail.com>:
> > What about adding a "case-insensitive name" field (in addition to
> > "canonical name", and "pristine name") to the pristine index?  So on
> > Unix you might have "Makefile" and "makefile" as the canonical names
> > (referred to pristine names like 1.dat or 233.dat) and then add in
> > case-insensitive names "makefile" and "Makefile1".
>
> A similar kind of scheme might help with the Windows tricky filename
> issue.

The biggest issue with case-insensitivity in pristine is that if there
was a filename "conflict", any time in project history, even maybe in
patch 1 and corrected in patch 2, a case-insensitive user cannot pull
the repository (there's a bug on the bug reporter about this I think,
on the GHC tree).
Having case-insensitivity in working dir is much less a problem: we
can just detect and complain or doing something "smart" as renaming
the file, and it concerns only the last "revision" of the source code.

So, let's summarize a bit. I'm going to write a more detailed proposal
when we reach a good consensus, then post it there for "final review".

By the way, tell me if I'm too "chatty" :). I'm accustomed to group
development, so I like to share a lot my thoughts.

- Everybody agree that a "file index" solution, without using
relational db, is better.
- We can use a plain text storage for file index, if there aren't
excessive performance problems.
- We would like to compute a checksum of the file (we are going to
read it all anyway, and it will be in FS cache after the first read).
- We like made-up filenames, so we can help the user to not
accidentally modify them.
- We like made-up filenames, so we can prevent the case-insensitivity
and not allowed filename problem on the pristine cache.
- We like to be fast on directory and file move, so the made-up
filenames should be independent of the path position (so we just
update the index).
- We can continue to use the hard-link trick for local copy of repositories.
- The system has to check for the three corruption cases (file in FS
but not in index, file in index but on in FS, file different in file
index and FS), complain for the last two and offer an "optimize"
option for the first one.
- Instead of "sequential" filenames, we could use filenames similar to
the one in patches directory. It helps with remote copy, as there're
no filename conflicts.
- We could "partition" the files on multiple directories, with a hash
algorithm, if there're too many (I think that most modern filesystems
have no problems even with tens of thousands of files, but better
check).

Anything else?

Salvatore


More information about the darcs-devel mailing list