[darcs-users] strace of darcs whatsnew on a biggish repo

Gwern Branwen gwern0 at gmail.com
Fri Apr 25 02:15:46 UTC 2008


On 2008.04.25 00:54:32 +0200, Alexander Staubo <alex at purefiction.net> scribbled 1.2K characters:
> On Fri, Apr 25, 2008 at 12:35 AM, zooko <zooko at zooko.com> wrote:
> >  A new user started to convert his company to darcs, but then had to
> >  back out and go back to using SVN when it turned out that "darcs
> >  whatsnew" took 17 seconds and his co-workers couldn't stand that.
> >  (The equivalent call, "svn diff" takes around 1.7 seconds -- about
> >  10x as fast.)
>
> My experience is that Darcs performs rather poorly in the presence of
> large, untracked files in the working directory. It probably reads
> each file into memory, perhaps in order to determine whether it's
> binary or not?
>
> On OS X and probably other operating systems this forces the OS to
> swap out pretty much everything to disk, which makes the system
> virtually unusable while it's running and for a while afterwards, when
> the OS needs to swap everything back in again.
>
> I am intimately familiar with this issue because I tend to put 500MB
> database dumps in my working directory before accidentally running
> "darcs whatsnew".
>
> > It looks like there is probably quite a bit of room for optimization
> > in darcs-2's use of the filesystem.
>
> Probably. Git and Mercurial do not suffer from this problem, either.
>
> Alexander.

It's hard to tell what's causing the slowdown.

If you look at one of my profiling runs for 'whatsnew' <http://lists.osuosl.org/pipermail/darcs-devel/attachments/20080412/7854901d/attachment-0026.obj>, you see that

 COST CENTRE                    MODULE               %time %alloc

 filetype_function              Darcs.Repository.Prefs  73.6   12.8

But when you go down the actual trace for the filetype_function, or you look at the definition, none of the called functions seem to be real time-wasters (except maybe 'normalize'). It does call a locally defined 'isbin', but notice that what darcs sees as a binary file is done via regexes:

 filetype_function ∷ IO (FilePath → FileType)
 filetype_function = do
    binsfile ← def_prefval "binariesfile" "_darcs/prefs/binaries"
    bins ← get_lines binsfile `catch`
             (λe→ if isDoesNotExistError e then return [] else ioError e)
    gbs ← get_global "binaries"
    regexes ← return (map (λr → mkRegex r) (bins ++ gbs))
    let isbin f = or $ map (λr → isJust $ matchRegex r f) regexes
        ftf f = if isbin $ normalize f then BinaryFile else TextFile
        in
        return ftf

So filetype_function defines a new function, ftf, that matches based on extensions, AFAIC. There doesn't seem to be any loading of files involved - which makes the slowdown still a mystery.

--
gwern
C3I Uzi unix B-1B ies joe security MI5 Dateline SC
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20080424/7f0106d4/attachment.pgp 


More information about the darcs-users mailing list