[darcs-devel] Profiling Darcs

Gwern Branwen gwern0 at gmail.com
Tue Apr 15 03:38:26 UTC 2008


On 2008.04.14 09:09:34 -0700, Jason Dagit <dagit at codersbase.com> scribbled 7.3K characters:
>    On Sat, Apr 12, 2008 at 10:31 AM, Gwern Branwen <gwern0 at gmail.com> wrote:
>
>      So I recently built a profiled Darcs binary (took forever and a day to
>      get all the dependencies right), and I've a small collection of output
>      from +RTS -p -RTS. They're interesting reading.
>
>    Yes, cabal really should give people a regular and profiled build by default.

That would be nice, but I understand why Cabal doesn't do it by default: not everything *can* be built with profiling yet, and if they switched it, then there would be massive breakage as people with 'old' libraries tried to compile new stuff. It could only be done with a major breaking GHC release, but they might not anyway even if you could compile everything - it'd double compilation time and disk space to aid a small minority of developers.

> Or
>    alternatively, as a Haskell developer you should just get used to compiling everything you
>    install for profiling.  This saves a lot of time down the road in my experience.

Well, once I managed to rebuild everything, I did stick a -p into my scripts for configure. This isn't perfect, as it'll still break when I try to compile anything that uses the GHC API (like, say, Yi). But it'll be useful in the future, as you say.

>    In my opinion, the lazy IO of darcs should be completely scrapped.  It allows the code to be
>    written more elegantly but it prevents us from really getting into fine tune things and figure
>    out what is going on.  Furthermore, at least with FPS, darcs would mmap entire files up front
>    meaning your file reading is limited by the amount of virtual memory your computer can mmap at
>    once.  This could be changed in ByteString.  And from what little I know of ByteString, it is
>    changed and probably entire files are no longer mmap'd but instead chunked into memory.  If
>    I'm right, this could help explain why pack/unpack are suddenly showing up a lot in the
>    charts.

Hm. So you figure sticking to the regular ByteString readFile/writeFile operators would help since it's the mmap* functions' fault?

>    Another possibility could be that your darcs2 repository is using more hashes than your darcs1
>    repo tests?  I should mention, I haven't really read through your profiles.

Possible. The only darcs-2 repos I really have to work against are GHC and Darcs, right now.

>      # Some of the top time eaters are odd, to me. Why is parseDate so
>      often called/expensive? How come is_file sometimes is the top
>      function? And so on.
>      # Why does gcau_simple take 90% of the time when I pull from the
>      darcs-2 ghc repo, or when I revert, but rarely shows up elsewhere?
>
>    If you're like me, you have no idea what "gcau_simple" means.  It turns out that gcau means,
>    "get common and uncommon".  So, gcau_simple is a "simple" way to compute which patches are
>    common (shared) and which ones are uncommon (only appear in one repository) between two
>    repositories.
>
>    This is an essential step for many operations that work between repositories.  I think the
>    reason it can take so long is that patches can only be compared for equality if they are in
>    the same context.  Sometimes getting a patch into the right context takes a bit of commuting
>    so you can permute your patch sequences.
>
>    I hope that helps,
>    Jason

Yes, there's some things to think about, that's clear.

--
gwern
UOP SAW VIP GCHQ Freeh USACIL 26 picking Lanceros Branch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-devel/attachments/20080414/c97c95d9/attachment.pgp 


More information about the darcs-devel mailing list