[darcs-users] strace of darcs whatsnew on a biggish repo
Gwern Branwen
gwern0 at gmail.com
Fri Apr 25 02:15:46 UTC 2008
On 2008.04.25 00:54:32 +0200, Alexander Staubo <alex at purefiction.net> scribbled 1.2K characters:
> On Fri, Apr 25, 2008 at 12:35 AM, zooko <zooko at zooko.com> wrote:
> > A new user started to convert his company to darcs, but then had to
> > back out and go back to using SVN when it turned out that "darcs
> > whatsnew" took 17 seconds and his co-workers couldn't stand that.
> > (The equivalent call, "svn diff" takes around 1.7 seconds -- about
> > 10x as fast.)
>
> My experience is that Darcs performs rather poorly in the presence of
> large, untracked files in the working directory. It probably reads
> each file into memory, perhaps in order to determine whether it's
> binary or not?
>
> On OS X and probably other operating systems this forces the OS to
> swap out pretty much everything to disk, which makes the system
> virtually unusable while it's running and for a while afterwards, when
> the OS needs to swap everything back in again.
>
> I am intimately familiar with this issue because I tend to put 500MB
> database dumps in my working directory before accidentally running
> "darcs whatsnew".
>
> > It looks like there is probably quite a bit of room for optimization
> > in darcs-2's use of the filesystem.
>
> Probably. Git and Mercurial do not suffer from this problem, either.
>
> Alexander.
It's hard to tell what's causing the slowdown.
If you look at one of my profiling runs for 'whatsnew' <http://lists.osuosl.org/pipermail/darcs-devel/attachments/20080412/7854901d/attachment-0026.obj>, you see that
COST CENTRE MODULE %time %alloc
filetype_function Darcs.Repository.Prefs 73.6 12.8
But when you go down the actual trace for the filetype_function, or you look at the definition, none of the called functions seem to be real time-wasters (except maybe 'normalize'). It does call a locally defined 'isbin', but notice that what darcs sees as a binary file is done via regexes:
filetype_function ∷ IO (FilePath → FileType)
filetype_function = do
binsfile ← def_prefval "binariesfile" "_darcs/prefs/binaries"
bins ← get_lines binsfile `catch`
(λe→ if isDoesNotExistError e then return [] else ioError e)
gbs ← get_global "binaries"
regexes ← return (map (λr → mkRegex r) (bins ++ gbs))
let isbin f = or $ map (λr → isJust $ matchRegex r f) regexes
ftf f = if isbin $ normalize f then BinaryFile else TextFile
in
return ftf
So filetype_function defines a new function, ftf, that matches based on extensions, AFAIC. There doesn't seem to be any loading of files involved - which makes the slowdown still a mystery.
--
gwern
C3I Uzi unix B-1B ies joe security MI5 Dateline SC
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20080424/7f0106d4/attachment.pgp
More information about the darcs-users
mailing list