[darcs-devel] Profiling Darcs
Jason Dagit
dagit at codersbase.com
Mon Apr 14 16:09:34 UTC 2008
On Sat, Apr 12, 2008 at 10:31 AM, Gwern Branwen <gwern0 at gmail.com> wrote:
> So I recently built a profiled Darcs binary (took forever and a day to
> get all the dependencies right), and I've a small collection of output
> from +RTS -p -RTS. They're interesting reading.
Yes, cabal really should give people a regular and profiled build by
default. Or alternatively, as a Haskell developer you should just get used
to compiling everything you install for profiling. This saves a lot of time
down the road in my experience.
I know not many people can build such a binary, so I thought I'd send
> them to the list for everyone to look at. The top-level summaries
> follow. I'd like to note that there are some surprises, for me at
> least:
> # For Darcs-2 repos, the ByteString packed/unpack routines seem to
> take a really distressing amount of time. One of the two seems to
> always be in the top 4, and sometimes eating up extremely large
> amounts of time, like 38%. I have a theory that this is because the
> curl/libwww functions return a String, which immediately has to be
> packed into a ByteString for all the other functions; if this is true,
> then we might be able to get a real performance improvement with
> functions that do network I/O in ByteString. I doubt this'll happen
> any time soon though (there is a network-bytestring on Hackage, but it
> only does sockets and I'm pretty sure that that's not what's needed).
In my opinion, the lazy IO of darcs should be completely scrapped. It
allows the code to be written more elegantly but it prevents us from really
getting into fine tune things and figure out what is going on. Furthermore,
at least with FPS, darcs would mmap entire files up front meaning your file
reading is limited by the amount of virtual memory your computer can mmap at
once. This could be changed in ByteString. And from what little I know of
ByteString, it is changed and probably entire files are no longer mmap'd but
instead chunked into memory. If I'm right, this could help explain why
pack/unpack are suddenly showing up a lot in the charts.
Another possibility could be that your darcs2 repository is using more
hashes than your darcs1 repo tests? I should mention, I haven't really read
through your profiles.
> # Some of the top time eaters are odd, to me. Why is parseDate so
> often called/expensive? How come is_file sometimes is the top
> function? And so on.
> # Why does gcau_simple take 90% of the time when I pull from the
> darcs-2 ghc repo, or when I revert, but rarely shows up elsewhere?
If you're like me, you have no idea what "gcau_simple" means. It turns out
that gcau means, "get common and uncommon". So, gcau_simple is a "simple"
way to compute which patches are common (shared) and which ones are uncommon
(only appear in one repository) between two repositories.
This is an essential step for many operations that work between
repositories. I think the reason it can take so long is that patches can
only be compared for equality if they are in the same context. Sometimes
getting a patch into the right context takes a bit of commuting so you can
permute your patch sequences.
I hope that helps,
Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osuosl.org/pipermail/darcs-devel/attachments/20080414/1d7e7c76/attachment.htm
More information about the darcs-devel
mailing list