[darcs-users] darcs patch: remove quadratic blowups from mapPrimFL
Ganesh Sittampalam
ganesh at earth.li
Sun Nov 2 01:58:09 UTC 2008
On Sat, 1 Nov 2008, David Roundy wrote:
> On Thu, Oct 30, 2008 at 06:01:26AM +0000, Ganesh Sittampalam wrote:
>> On Wed, 29 Oct 2008, Jason Dagit wrote:
>>> Have you retimed things with the full set of patches you submitted?
>>> Do you know what the overall improvement would be?
>>
>> Nope - without some really good "fire and forget" infrastructure and a
>> dedicated machine that can be guaranteed quiescent, benchmarking is quite
>> fiddly and time-consuming, so I've only been doing it for things where it
>> seemed particularly warranted.
>
> I've just reviewed this one, and it looks correct, but I couldn't predict
> whether its performance behavior. So I'd rather not apply it, unless
> either you can explain it to me in such a way that I can understand the
> improvement is, or you have benchmarks demonstrating the improvement.
I've timed it on whatsnew -sl on a directory containing 1000 new files,
and it's definitely a major speedup (I forget the timing numbers, but from
the profile it was clearly quadratic to linear as I claim).
> I can see that you replace (+>+) with (:>:) using some clever tricks (which
> is definitely always a bonus), but that only affects the scaling when many,
> many changes are made to a single file, in which case this is almost
> certainly not a bottleneck (since we're diffing said file, which is a slow
> operation).
> The other change (and I think these two changes are separable?) is a switch
> from foldl' to foldr, and I must admit that these fold functions almost
> always confuse the heck out of me...
They are separable, and I didn't check which of them was responsible for
the actual speedup in the test case I made, but they both seemed sensible
as a general principle. The basic goal is that if we ever use (+>+), we
should make sure it associates to the right, since it's linear in its
*left* argument but constant time in its right argument, like (++) is.
That's the effect of switching to foldr in this case; the key point is
that in this particular case I wouldn't expect any semantic difference
between the folds, as (+>+) is associative; it's just a performance
change. In addition the fact that we're building a lazy data structure
rather than a strict value means that foldr is almost certain to be the
more natural thing to use here.
Ganesh
More information about the darcs-users
mailing list