[darcs-users] darcs patch: switch Darcs.Patch.FileName to be ByteString.Char8 int...

Neil Mitchell ndmitchell at gmail.com
Sun Nov 6 13:23:43 UTC 2011


Hi Jason,

Sorry for the ridiculous delay in replying, but perhaps the information is
still useful. I think having a filepath-bytestring package could be very
useful for path heavy apps such as darcs. However, before doing that I
would suggest you profile inside the filepath library. It was written for
correctness, not speed, and there are plenty of places I traverse a string
many more times than necessary. There are plenty of tests, so any
performance improvements should be easy to check.

Thanks, Neil

On Friday, September 25, 2009, Jason Dagit <dagit at codersbase.com> wrote:
>
>
> On Thu, Sep 24, 2009 at 1:13 AM, Reinier Lamers <reinier.lamers at gmail.com>
wrote:
>>
>> Hi Jason,
>>
>> 2009/9/24 Jason Dagit <dagit at codersbase.com>:
>> > Doing the above test I have discovered that Darcs.Patch.FileName is a
>> > very costly module.  It is costly mainly in terms of space usage.  The
>> > space usage forces the garbage collector to run far too frequently and
>> > this burns up CPU time, allocates a ton of virtual memory, and wastes
>> > siginificant amounts of ram.  On my test machine, the virtual memory
>> > usage is just over 1GB when profiling, and uses 400-500 megs of
>> > physical ram.
>>
>> Great that you do this research! If we keep up the current pace of
>> performance hacking, darcs will complete before you even hit the enter
>> key in a few years :-)
>
> Heh.  Thanks.  I haven't had any real luck improving things yet though.
In fact, at Ganesh's request I think I'm giving up on optimizing the
darcs.net source any further.  I've moved on to working with darcs-hs.  I
just sent Petr a patch for hashed-storage that makes zipTrees significantly
faster on my test case and now darcs-hs can run my test in about 23 seconds
(regular darcs is about 29 seconds, so yay!).  zipTrees could probably be
further improved but at this point it's no longer on the radar as a slow
point in the code so I'm moving on to other functions.
>
> System.FilePath is one of the big slow downs now.  I wonder if we need a
System.FilePath.ByteString version?  I don't know if it would help.  The
real problem is that we do a lot of path munging that we should perhaps not
be doing.
>
> Hashed.Storage.AnchoredPath.floatPath looks like this:
> -- | Take a relative FilePath and turn it into an AnchoredPath. The
operation
> -- is unsafe and if you break it, you keep both pieces. More useful for
> -- exploratory purposes (ghci) than for serious programming.
> floatPath :: FilePath -> AnchoredPath
> floatPath = AnchoredPath . map (Name . BS.pack) . splitDirectories
>             . normalise . dropTrailingPathSeparator
>
> The expensive parts are as follows (from most expensive to least):
> 1. normalise
> 2. BS.pack
> 3. splitDirectories
>
> splitDirectories and normalise both come from System.FilePath.  Neil, do
you think a ByteString filepath would help?
>
> The other thing I don't understand here is the haddock for this
function.  What does it split?  I don't understand what pieces it makes; it
certainly seems that it just returns an AnchoredPath.  If it's a joke then
I don't understand it (I also don't think a joke belongs in a haddock
because it's confusing).  The other odd thing is that it's used quite a lot
in Darcs.IO.  If it's unsafe and meant for interactive use why do we rely
on it so much?  Is that a bug waiting to happen?  Petr, comments please?
>
> Thanks,
> Jason
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20111106/df15fba1/attachment.html>


More information about the darcs-users mailing list