[darcs-users] darcs patch: switch Darcs.Patch.FileName to be ByteString.Char8 int...

Jason Dagit dagit at codersbase.com
Fri Sep 25 06:18:14 UTC 2009

On Thu, Sep 24, 2009 at 1:13 AM, Reinier Lamers <reinier.lamers at gmail.com>wrote:

> Hi Jason,
> 2009/9/24 Jason Dagit <dagit at codersbase.com>:
> > Doing the above test I have discovered that Darcs.Patch.FileName is a
> > very costly module.  It is costly mainly in terms of space usage.  The
> > space usage forces the garbage collector to run far too frequently and
> > this burns up CPU time, allocates a ton of virtual memory, and wastes
> > siginificant amounts of ram.  On my test machine, the virtual memory
> > usage is just over 1GB when profiling, and uses 400-500 megs of
> > physical ram.
> Great that you do this research! If we keep up the current pace of
> performance hacking, darcs will complete before you even hit the enter
> key in a few years :-)

Heh.  Thanks.  I haven't had any real luck improving things yet though.  In
fact, at Ganesh's request I think I'm giving up on optimizing the
darcs.netsource any further.  I've moved on to working with darcs-hs.
I just sent
Petr a patch for hashed-storage that makes zipTrees significantly faster on
my test case and now darcs-hs can run my test in about 23 seconds (regular
darcs is about 29 seconds, so yay!).  zipTrees could probably be further
improved but at this point it's no longer on the radar as a slow point in
the code so I'm moving on to other functions.

System.FilePath is one of the big slow downs now.  I wonder if we need a
System.FilePath.ByteString version?  I don't know if it would help.  The
real problem is that we do a lot of path munging that we should perhaps not
be doing.

Hashed.Storage.AnchoredPath.floatPath looks like this:
-- | Take a relative FilePath and turn it into an AnchoredPath. The
-- is unsafe and if you break it, you keep both pieces. More useful for
-- exploratory purposes (ghci) than for serious programming.
floatPath :: FilePath -> AnchoredPath
floatPath = AnchoredPath . map (Name . BS.pack) . splitDirectories
            . normalise . dropTrailingPathSeparator

The expensive parts are as follows (from most expensive to least):
1. normalise
2. BS.pack
3. splitDirectories

splitDirectories and normalise both come from System.FilePath.  Neil, do you
think a ByteString filepath would help?

The other thing I don't understand here is the haddock for this function.
What does it split?  I don't understand what pieces it makes; it certainly
seems that it just returns an AnchoredPath.  If it's a joke then I don't
understand it (I also don't think a joke belongs in a haddock because it's
confusing).  The other odd thing is that it's used quite a lot in Darcs.IO.
If it's unsafe and meant for interactive use why do we rely on it so much?
Is that a bug waiting to happen?  Petr, comments please?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20090924/2efb54d5/attachment.htm>

More information about the darcs-users mailing list