Hello,<br><br>I have an overall goal of making darcs more memory efficient.  My ideal scenario is for every darcs computation to be O(1) in space.  Maybe I&#39;ll never get there, but we can certainly improve on the current darcs memory usage.  So, I spent some time last weekend looking over iteratees in greater depth than I have before.  I also did a comparative benchmark between lazy bytestrings and iteratees.<br>

<br>My test example was to generate a 100mb file (from /dev/random), and then in various languages (C and Haskell) I tried opening the file and doing various things.<br>

<br>In the C programs I compared fread vs. mmaping.  I didn&#39;t want to assume that I could mmap the entire file so I used a sliding window of mmap&#39;d regions of the file.  The fread buffer and the mmap buffer were getpagesize() bytes is length, I think that&#39;s 4kb on my computer, but honestly I didn&#39;t check.<br>


<br>With the mmap code, I found that if I didn&#39;t read the data from the mmap&#39;d buffer, then the performance was around 0.5 seconds.  I modified both the fread and mmap programs to calculate a 32bit checksum of the bytes and they had the same performance.  I also did timings with and without purging the OS caches before timing.  There was no discernible difference in their performance (that is, again assuming I look at each byte).  This makes me think that mmap is really only a win when you&#39;re doing random access on the mmap&#39;d data.  Otherwise, it seems that fread is easier to implement, easier to understand, and just as performant.<br>


<br>Next I started looking at iteratees versus lazy bytestrings.<br><br>My test was essentially the same.  I used the lazy bytestring readFile to get the contents of the file.  Then I used the lazy bytestring version of foldl1&#39; to compute a checksum of the bytes, and I used the lazy bytestring version of mapM_ to echo the bytes to /dev/null.  The interesting thing here, is that either of these operations in isolation resulted in a constant memory overhead of about 2MB for the max RSS.  If I did both computations on the same stream returned from readFile then the garbage collector kept the entire contents of the stream in memory.  This resulted in about 230MB for the max RSS.<br>


<br>Next I turned to iteratees.  I feel like I understand the iteratees fairly well when I read Oleg&#39;s slides.  What I found is that, in fact, the iteratees are very difficult to work with.  You program in a continuation passing style and your streams have multiple states they can be in.  Every function that works on streams need to handle approximately 3 cases for the stream.  The next hurdle was learning how to run an iteratee over a stream.  I was actually surprised by how difficult this latter bit turned out to be, and I&#39;m still not confident that what I ended up with is the correct or simplest way.<br>


<br>Next I tried composing my echo iteratee with my checksum iteratee (so I could mimic the bytestring example).  What I found was that I didn&#39;t know how to combine iteratees that process the stream at the same level (neither is lifted into the other) and neither consumes the stream produced by the other.  The iteratee library allows for horizontal composition (monadic bind) and vertical composition (lift), but the only parallel composition I see is enumPair.  enumPair probably does what I want, but by that point I had realized I&#39;m displeased with the iteratee implementation for other reasons.<br>


<br>What I like about iteratees is that they capture the observation that my bytestring example demonstrates.  That is, if you process the same lazy stream twice with different traversals it&#39;s very easy to end up with the whole stream in memory.  Because this is so easy to do &quot;on accident&quot;, Oleg thought about forcing resource usage bounds.  He realized that strict left-folds encapsulated most stream processing and then derived his iteratee library from that concept.  I think he has the right idea.<br>

<br>Another criticism that Ganesh rightfully pointed out is that iteratees invert the normal Haskell control flow, but I suspect that is unavoidable if we want to be serious about enforcing memory bounded stream processing (this is something I care about greatly).<br>

<br>My next experiment will be to find ways to express &quot;take this operation and apply it to a stream without letting the stream leak&quot;. One implication is that gzReadFilePS should not be used outside of a core set of modules which have been auideted to be resource concious.  Another implication is that we need to be really careful about wether or not we allow returning of sequences of patches.  Possibly, we need several foldl-like functions that open the stream internally.  For example, to process the pending maybe we should have:<br>

withPending :: (a -&gt; Patch -&gt; a) -&gt;  IO a<br><br>And withPending would start the streaming and make sure that the stream cannot be visible as a data dependency outside of withPending.  I could also imagine trying to make it more general:<br>

withPatches :: (a -&gt; Patch -&gt; a) -&gt; IO (FL Patch) -&gt; IO a<br><br>Either way, I need to think about this more but I wanted to post this in case others wonder what I&#39;ve been up to or have ideas.<br><br>Thanks,<br>

Jason<br>