[darcs-users] Iteratees, streams, and mmap
dagit at codersbase.com
Thu Dec 10 03:28:37 UTC 2009
I have an overall goal of making darcs more memory efficient. My ideal
scenario is for every darcs computation to be O(1) in space. Maybe I'll
never get there, but we can certainly improve on the current darcs memory
usage. So, I spent some time last weekend looking over iteratees in greater
depth than I have before. I also did a comparative benchmark between lazy
bytestrings and iteratees.
My test example was to generate a 100mb file (from /dev/random), and then in
various languages (C and Haskell) I tried opening the file and doing various
In the C programs I compared fread vs. mmaping. I didn't want to assume
that I could mmap the entire file so I used a sliding window of mmap'd
regions of the file. The fread buffer and the mmap buffer were
getpagesize() bytes is length, I think that's 4kb on my computer, but
honestly I didn't check.
With the mmap code, I found that if I didn't read the data from the mmap'd
buffer, then the performance was around 0.5 seconds. I modified both the
fread and mmap programs to calculate a 32bit checksum of the bytes and they
had the same performance. I also did timings with and without purging the
OS caches before timing. There was no discernible difference in their
performance (that is, again assuming I look at each byte). This makes me
think that mmap is really only a win when you're doing random access on the
mmap'd data. Otherwise, it seems that fread is easier to implement, easier
to understand, and just as performant.
Next I started looking at iteratees versus lazy bytestrings.
My test was essentially the same. I used the lazy bytestring readFile to
get the contents of the file. Then I used the lazy bytestring version of
foldl1' to compute a checksum of the bytes, and I used the lazy bytestring
version of mapM_ to echo the bytes to /dev/null. The interesting thing
here, is that either of these operations in isolation resulted in a constant
memory overhead of about 2MB for the max RSS. If I did both computations on
the same stream returned from readFile then the garbage collector kept the
entire contents of the stream in memory. This resulted in about 230MB for
the max RSS.
Next I turned to iteratees. I feel like I understand the iteratees fairly
well when I read Oleg's slides. What I found is that, in fact, the
iteratees are very difficult to work with. You program in a continuation
passing style and your streams have multiple states they can be in. Every
function that works on streams need to handle approximately 3 cases for the
stream. The next hurdle was learning how to run an iteratee over a stream.
I was actually surprised by how difficult this latter bit turned out to be,
and I'm still not confident that what I ended up with is the correct or
Next I tried composing my echo iteratee with my checksum iteratee (so I
could mimic the bytestring example). What I found was that I didn't know
how to combine iteratees that process the stream at the same level (neither
is lifted into the other) and neither consumes the stream produced by the
other. The iteratee library allows for horizontal composition (monadic
bind) and vertical composition (lift), but the only parallel composition I
see is enumPair. enumPair probably does what I want, but by that point I
had realized I'm displeased with the iteratee implementation for other
What I like about iteratees is that they capture the observation that my
bytestring example demonstrates. That is, if you process the same lazy
stream twice with different traversals it's very easy to end up with the
whole stream in memory. Because this is so easy to do "on accident", Oleg
thought about forcing resource usage bounds. He realized that strict
left-folds encapsulated most stream processing and then derived his iteratee
library from that concept. I think he has the right idea.
Another criticism that Ganesh rightfully pointed out is that iteratees
invert the normal Haskell control flow, but I suspect that is unavoidable if
we want to be serious about enforcing memory bounded stream processing (this
is something I care about greatly).
My next experiment will be to find ways to express "take this operation and
apply it to a stream without letting the stream leak". One implication is
that gzReadFilePS should not be used outside of a core set of modules which
have been auideted to be resource concious. Another implication is that we
need to be really careful about wether or not we allow returning of
sequences of patches. Possibly, we need several foldl-like functions that
open the stream internally. For example, to process the pending maybe we
withPending :: (a -> Patch -> a) -> IO a
And withPending would start the streaming and make sure that the stream
cannot be visible as a data dependency outside of withPending. I could also
imagine trying to make it more general:
withPatches :: (a -> Patch -> a) -> IO (FL Patch) -> IO a
Either way, I need to think about this more but I wanted to post this in
case others wonder what I've been up to or have ideas.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the darcs-users