[darcs-users] recent darcs performance progress

Eric Kow kowey at darcs.net
Mon Apr 19 06:35:26 UTC 2010

On Sun, Apr 18, 2010 at 16:04:53 -0600, Zooko Wilcox-O'Hearn wrote:
> >I visited http://wiki.darcs.net/Benchmarks and looked at the Tahoe
> >graphs for machines quasar, apricot, vs2 (those that we have
> >graphs for).
> It is very cool that darcs has these benchmarks organized. This
> gives me increased hope for future versions of darcs. As always with
> benchmarks, it is hard to summarize the results in a readable way.
> Perhaps the software that produced this site could help: http://
> speed.pypy.org .

Wow, that's a very nice looking performance page.  I like the timeline
and the display of faster/slower visual indicators that Ian has been
asking for since we first started working on benchmarking in 2008.

At the very least, that page ought to be inspiration for future work
if it turns out we can't just directly apply the tools.

One thing that may help us in the latter is that the raw timings for
a darcs-benchmark run are stored in ~/.darcs-benchmark and written as
a simple tab delimited file.  So you can (i) always produce new
visualisations of old data and (ii) you can accumulate benchmarks
over time in case of insufficient daat.

> Anyway, I see that darcs-2.4.0 with optimized repo *does* have
> significantly better performance for some operations on some
> machines. I guess if any of the Tahoe-LAFS contributors complain to
> me about darcs performance I will suggest to them that try upgrading
> to darcs-2.4.1 and running optimize and see if that helps.

optimize --pristine

In the future, I'd like to sweep up all optimisations of this sort
(one-time-only actions) into a single optimize --upgrade command to
get the latest and greatest repo variant.  For now, optimize --upgrade
only goes from old-fashioned to hashed repositories.  But it really
ought to do this optimize --pristine thing as well and in the future
build Benedikt's patch index for you.

> >Petr has work in orbit <http://bugs.darcs.net/patch156> which will
> >complete the arc started by his summer of code project.  The
> >hashed-storage stuff was a big chunk of work and it looks like we
> >still need this finishing blow to fully benefit from it.
> Can you give some indication of what affect this is likely to have
> from the perspective of an end-user?

I hope for these to make repository-local operations faster still.  This
is completing the work started in Darcs 2.3 (whatsnew), fleshed out in
Darcs 2.4 (whatsnew, record, revert, unrevert, diff).

If we're right about this, it should go from "we think it's a little bit
faster, but sort of hit and miss" to something more confident, hopefully
plain old "it's faster".

If we're wrong about this, it still means we get cleaner code (more
separation of concerns as a lot of our functionality is now shipped
off to Petr's generic hashed-storage module), which means saner
Darcs developers, which means nicer Darcs... from the end user

> >He also has another patch <http://bugs.darcs.net/patch196>
> >(porting David's work over to HEAD) which fixes an issue scaling
> >with respect to the number of patches in your history.
> And this one -- can you explain what is the effect of this issue? Is
> it something that one can see in the Tahoe-LAFS repository or does
> it need more patches in your history to have a noticeable effect?

For the end user this should reduce the time needed to do a darcs
record, apply, pull, obliterate (anything that updates the set of
patches) from O(number of patches in history)
           to O(number of patches since the last tag)
  effectively O(1) if you have a policy of tagging regularly
                   (a nice time to tag would be on each release)

The bug in question is http://bugs.darcs.net/issue1106 and it affects
commands that write out the hashed inventory in repositories that have
a large number of patches.

As I understand it, Darcs is trying to compute the hash of the *entire*
patch inventory (because you've changed it).  This is actually a lot of
needless work because we've broken up the inventory file such that there
is one (hashed) inventory per tag.  Since by definition you can't
commute things in and out of tags, why do all that hashing?  We should
instead only hash the patches since the last tag.

So Tahoe-LAFS has a pretty small history (4000) patches.  I would indeed
be interested to see how much this work helps you guys.  The symptom
that finally made us notice this was Simon Marlow pointing out that
"darcs obliterate --last=1" takes 7 seconds in the GHC repository.

I hope that the two optimisations will make your every day hacking on a
local darcs repository a lot snappier.  A lot of different problems to
solve in Darcs optimisation, as you can see.

Anyway, for things that affect *you* as a trac-dacrs user ,when we can
wheel our attention back to it, I want us to implement your suggestion
of adding a darcs show contents xxx --match 'hash f00' test to
darcs-benchmark (we'll probably need to introduce a concept of timing
out on benchmarks).  Then we can see if Benedikt's patch index
optimisation has the desired effect.  I'm optimistic.

Benedikt, if you've been following this thread at all, I think it'd be a
nice informal benchmark to try locally.  See
for details.

Thanks, all!  The progress is just barely starting to be tangible.
Maybe in one or two releases we can finally say that we've started
to turn this thing around :-)

Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20100419/5ec452c1/attachment.pgp>

More information about the darcs-users mailing list