[darcs-devel] darcs patch: Change a can't... (optimization discussion)

David Roundy droundy at darcs.net
Thu Apr 14 06:05:55 PDT 2005


On Thu, Apr 14, 2005 at 12:26:54AM +0100, Ian Lynagh wrote:
> On Wed, Apr 13, 2005 at 09:00:22AM -0400, David Roundy wrote:
> > > [Read the patch back as a string when recording to get laziness
> > > Ian Lynagh <igloo at earth.li>**20050413034632] {
> > > hunk ./Record.lhs 61
> > > - -import FastPackedString ( gzReadFilePS )
> > > +import FastPackedString ( gzReadFile )
> > > hunk ./Record.lhs 191
> > > - -                 ps <- gzReadFilePS fp
> > > +                 ps <- gzReadFile fp
> > > }
> > 
> > How much of a difference does this actually make? My guess would be that
> > the cost of using String processing would slow this down enough to more
> > than make up for the memory benefits of not having to hold the entire file
> > in memory at once.
> 
> I'd feared it would be high, but then in this mail you say:
> 
> > > I've also made the application of the patch when recording read the
> > > patch as a String so we get laziness. I'm not sure what effect this has
> > > on runtime.
> > 
> > It doesn't seem (I just ran a test) to have much effect on either memory or
> > speed.  Probably just because that portion of the code isn't a bottleneck.
> 
> Ah, that is good news indeed! I had been thinking we might need to make
> [PackedString], with each PackedString being a 1M chunk or something, a
> Stringalike to get both laziness and performance.
> 
> When profiling, the lazy string way is much slower, but that could be
> an artifact of the profiling changing the optimisations done.

Okay.  Indeed, I was surprised not to find that it was slower.  I had also
thought about using a [PackedString] rather than a String, and I'd still
tend to lean in that direction, mostly because I really dislike
Strings... :) But yes, if there's no performance cost here, this might be
okay.

> > Perhaps it's worth pointing out here that if the file isn't compressed
> > (and we're running on a posixy system), then gzReadFilePS uses mmap,
> > which means that the file doesn't use up virtual memory, and is read
> > "on demand"--almost like lazy IO, only it's seriously efficient.
> 
> Sure - i was going to have the uncompressed case call mmapfile, but then
> I remembered that that fell back to strict reading when the mmap failed,
> so we'd need to copy the mmap code. And as I knew what code path was
> going to be used anyway I just left it for now :-)
> 
> > I presume that if this is helpful here, you'd want to do the same thing
> > (reading the patch file lazily) in other parts of the code?
> 
> Yes.
> 
> > > +gzReadFile :: FilePath -> IO String
> > > +gzReadFile f = do
> [lazily read file]
> > 
> > I don't care for this function, as it violates a policy of not leaving
> > files open in a lazy manner.  The problem is that if we're running on
> > windows (or a SMB share) we won't be able to delete or overwrite this
> > file.  I know that in your use down below this isn't an issue, but I'm
> > much more comfortable when we leave that up to the user--it's too
> > painful tracking down bugs caused by trying to delete a file before we
> > finish reading it.  We went to the trouble of getting rid of all
> > instances of readFile, and I'd rather not add back more such potential
> > bugs.
> 
> I understand that, but the memory gain would be worthwhile IMO. I'm not a
> fan of this "only 500M? Pah!" point of view :-)

Okay, that's a valid point.  On the other hand, if we are going to require
500M anyways, perhaps it's a nonissue.

But I'd rather at least encapsulate the lazy-reading IO as much as
possible.  We could call gzReadFile something like gzUnsafelyLazilyReadFile
(or perhaps something somewhat less alarmist), and then implement a

readPatchFileLazily :: FilePath -> IO Patch

which is what would actually be used in the code.  Then the understanding
would have to be that once you read such a patch, you couldn't modify that
patch file.

> I've put some graphs of biographical profiling up here:
> 
> http://urchin.earth.li/~ian/dp/dp.html
> 
> (this is with the nasty hack)
> 
> The first is with gzReadFile. You can see the three main phases:
> * Writing the patch (red bit up to 400s)
> * Applying to current (blue bit up to almost the end)
> * Syning timestamps (spike at the end)
> 
> Peak memory usage is about 50M; 25M if you assume we'll fix the
> timestamp syncing.
> 
> In the second, with gzReadFilePS, the first and third phases are the
> same (well, the colours are different due to the way their assigned),
> but the middle is now 200M as the whole patch is read into memory.

Hmmm.  That is pretty compelling.  I guess I usually tend to figure that
big blocks of memory aren't so harmful--and it's true that it's much less
harmful than an equivalent amount of small chucks of memory, which would
cost us in GC time.

How much of the benefit comes from the lazy patch reading, and how much
from the ugly hack on the patch selection?

> > > The timestamp synching seems to be using a silly amount of memory. I
> > > wonder if it's accidentally reading all the files or something.
> > 
> > Oh, no.  Actually you're right, it *is* reading all the files.  :( It's
> > doing that to make sure that they are the same as what's in the working
> > directory.
> 
> Ah, OK, that makes sense. But something slightly odd must be going on.
> It should need only about twice the largest file size (<1M) or something
> to do the comparison, and even then only in a very brief spike, so it
> looks like it's holding on to everything. But it can't be holding on to
> /everything/ or it would be up around 200M. It seems a little high to be
> just a directory tree of 20k files. Anyway, it certainly points at
> something being too keen to hold on to things.

Hmmm.  Perhaps the Slurpy traversal code in the sync is holding onto some
of the file contents due to an improper lack of laziness?

> For example, we should be able to make "darcs record" followed by "a"
> have the same performance, though, if we make the patch count calculated
> on demand only. i.e. a display like
> 
> Shall I record this patch? (1/<c to calc>) [ynWsfqadjkc], or ? for help:

I guess this brings up the question of whether the benefit in memory use
would balance the cost in interface complexity...

> > > > Actually, my timing test just finished running, and my "kernel add
> > > > everything" test now runs in about 435m max RAM, and takes another 20%
> > > > less wallclock time.  This is about as good as the improved "get"
> > > > performance, so it looks like you've already gotten "record" down to
> > > > the point where it's no longer a memory bottleneck.
> > > 
> > > What exactly are you "get"ting here? A kernel repo with a single initial
> > > patch?
> > 
> > Yeah.
> 
> Then IMO that's a sign that get is also broken, not that record is
> sorted  :-)

:) On the other hand, the assumption that patches can be held in memory is
made all through darcs.  We should be carefully microoptimizing the memory
use of a particular command, since it's the maximum memory use in the
required command set (which itself is a user-defined concept) that
determines whether or not darcs is useable for a given repository.

> > I'd say that the "initial record" is pretty well optimized for now.  It's
> > not a very fundamentally important case, and we've got it down pretty
> > cheap.  I'd say we should now focus on making the "fast" cases *not* be
> > proportional in time to the size of the repository.  In particular, I have:
> > 
> > $ time darcs whatsnew                           
> > No changes!
> > 
> > real    0m5.526s
> > user    0m4.330s
> > sys     0m0.980s
> > 
> > We're spending an incredible amount of time checking timestamps and reading
> > directory contents.  Linus' git does this in 0.15s elapsed
> > time.  Similarly a record of a single small change in a kernel repository
> > takes 22s.  Git's equivalent of record takes 0.8s.
> 
> Do you know why the record takes 4 times as long? Shouldn't the record
> be essentially whatsnew + tiny amount of work to do the actual record?
> 
> Oh, actually, record will then walk over the tree again looking for
> changed files, so that's a factor of 2. Still another factor of 2 to
> find, though.

Yeah, it's interesting.  darcs whatsnew takes 5.5s, but whatsnew -l takes
11s.  Which suggests that perhaps it's the readdirs that are costing us,
since the -l gives us twice as many readdirs.  (slurp vs co_slurp)

Record takes the same time as whatsnew, when there aren't any changes present
(with or without -l).  If there *are* (minimal) changes present, record
takes 22s when run without -l, and 26s when run with -l.

So slurp is taking something like 5s.  sync_repo does one slurp, and thus
ought to take something like 5s, which leaves as a mystery the remaining
ten seconds or so.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-devel mailing list