[darcs-users] hashed repository issue
Petr Rockai
me at mornfall.net
Tue Dec 9 09:21:12 UTC 2008
Hi,
Dan Pascu <dan at ag-projects.com> writes:
> Not to mention that a directory with 32k files makes things _very_ slow.
> This flattened pristine directory may cause a considerable slowness of
> darcs2 compared to darcs1 if the repository contains many files. The same
> can be said for the flattened patches directory, but that is a problem
> common to both of them. Still as the number of patches increases, things
> will get gradually slower.
this is the n-th time this argument flies by, so I figured it was time to
quantify the problem. Here are the results:
10:08:56 | morn at eri:~/dev/rh/lvm2 -> time cp -Rl tailor/_darcs test_tailor
cp -Rl tailor/_darcs test_tailor 0,07s user 23,51s system 95% cpu 24,807 total
10:09:30 | morn at eri:~/dev/rh/lvm2 -> for d in test_tailor/patches test_tailor/pristine.hashed; do (cd $d; time runghc ../../letterify.hs); done
runghc ../../letterify.hs 1,08s user 0,13s system 87% cpu 1,374 total
runghc ../../letterify.hs 5,60s user 2,37s system 96% cpu 8,286 total
10:10:27 | morn at eri:~/dev/rh/lvm2 -> time cp -Rl test_tailor test2_tailor
cp -Rl test_tailor test2_tailor 0,06s user 1,95s system 99% cpu 2,016 total
10:10:38 | morn at eri:~/dev/rh/lvm2 -> time cp -Rl tailor/_darcs test3_tailor
cp -Rl tailor/_darcs test3_tailor 0,07s user 24,91s system 98% cpu 25,287 total
The tailor repository is a conversion of LVM2's CVS tree by tailor, as it were,
without any manual intervention like optimising or so. You can also see I have
not used darcs at all, just using cp -Rl, meaning recursively hardlink the
directory. The letterify.hs script is as follows:
import System.Directory
import System.FilePath
import Data.List
startsWith :: Char -> String -> Bool
startsWith l s = case dropWhile (/='-') s of
[] -> False
'-':x:_ | x == l -> True
_ -> False
main = do
sequence [ do
createDirectory [l]
files <- filter (startsWith l) `fmap` getDirectoryContents "."
mapM_ (\f -> renameFile f $ [l] </> f) files
| l <- ['a'..'f'] ++ ['0'..'9']]
You could possibly try this for yourself on your favourite repository.
The reading of these numbers is:
- flat patches + pristine gives you about 25 seconds of system time to just `cp
-Rl` the _darcs dir, presumably 99 % of that time being in internal directory
lookup routines
- just chopping this up into 16 buckets, based on first hex character of the
hash, gets us down to some 2 seconds, which is more than ten-fold speedup
- the last line is just to try with a little hotter cache, although I believe
it has been pretty hot with the first time already; I wouldn't expect that to
have significant impact anyway
So who's with me that for 2.3, we should add a "bucketed" keyword to our
_darcs/format and start using it? Moreover, we need to do the same to our
global cache, by now I get the following:
10:19:06 | morn at eri:~ -> l ~/.darcs/cache/pristine.hashed | wc -l
40645
10:19:10 | morn at eri:~ -> l ~/.darcs/cache/patches | wc -l
86769
Btw. that also disproves the 32k hard-limit, as this is ext-3 here.
Yours,
Petr.
--
Peter Rockai | me()mornfall!net | prockai()redhat!com
http://blog.mornfall.net | http://web.mornfall.net
"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
-- Blair P. Houghton on the subject of C program indentation
More information about the darcs-users
mailing list