[darcs-devel] Help with alternate current formats: shall we slurp?

Tue Jan 25 05:37:01 PST 2005

On Tue, Jan 25, 2005 at 02:14:47PM +0100, Juliusz Chroboczek wrote:
> > The first slurp actually isn't used...
> 
> I understand why you need a Slurpy at each of those points.  What I'm
> unable to work out is in which cases you actually need a fresh Slurpy,
> and in which cases reusing the previous Slurpy is good enough.

I think the question is how much of hte existing slurpy has been "used".
When we call slurp_write, it's clear that the entire thing has been read,
since it's all been written, so at that stage if we want a fresh one.

If we've just applied some patches to the slurpy, we can use
slurp_write_and_read_dirty, which writes the changed files, and then
reslurps just those files, which is faster, but still has the effect of
keeping memory usage down.  Normally the use of slurp_write_and_read_dirty
is encapsulated within a call to apply_patches.

> Now unless I can work that out (which is unlikely, since I've got next
> to no intuition of how lazy evaluation works, especially in the
> presence of the IO monad), I'll need to find a not-too-clumsy way to
> keep the current behaviour when slurping is cheap, and reuse previous
> Slurpies when it isn't.
> 
> Hmm...
> 
>   maybe_reslurpCurrent :: Slurpy -> IO Slurpy

Or perhaps we could do something even fancier.  What if we implemented a

slurpInfiniteCurrents :: IO [Slurpy]

which creates an infinite list of slurpies, each one of which should be
used exactly once.  This could be made even more efficient than slurping
twice for the "fast slurping" case, since the strict part of slurping
(which involves reading the directory listings) need only be done once, but
the file reading could be separated.  Using it would be pretty easy:

(s:s':_) <- slurpInfiniteCurrents

Then as long as you only used each s,s' etc once, you'd be efficient in
terms of memory usage, and in fact the "slow" current.none slurp wouldn't
need to hold the whole slurpy in memory, since it could just co_slurp the
same delayedTempDir again and again for each new slurpy that is desired.

A sketch of the "hard" part would be (assuming d is an abolute path)

slurpInfinite d = do s <- slurp "."
                     ss <- more_slurps s
                     return (s : ss)
    where more_slurps x = unsafeInterleaveIO $ do a <- co_slurp d x
                                                  as <- more_slurps a
                                                  return (a : as)

co_slurp is considerably faster than slurp, although we'd want to make sure
that there is an mmap version of co_slurp available, so we wouldn't be
giving up the mmap.  And of course we'd want to use the trick to check if
it's okay to use mmap at all here.

With the slurpInfinite function, you should be able to call slurpInfinite
on your temporary directory, so the changes in the actual commands should
*mostly* be limited to using slurpInfiniteCurrents whenever possible.  :)

(Plus, of course it's always fun to use infinite list, and even more fun
when each element of the infinite list is so large that you really don't
want to hold it in memory!)  ;)

I think I'll go ahead and implement a slurpInfinite, and perhaps also a
mmapSlurpInfinite.

> Okay, it looks like I need to rethink the whole scheme of things.
> Thanks for your input.

I'm hoping you can get the final version of the interface in and any code
needed just to make things work soon, so I can release 1.0.2 with the new
interface.  Then we could work on the efficiency issues at our leisure.
-- 
David Roundy
http://www.darcs.net