[darcs-users] report on using darcs-2.2.98.2 (performance of "darcs query contents")

Sun Jul 12 06:46:37 UTC 2009

Zooko,

(I am adding Lele to CC, since he may know how feasible this is...)

Zooko Wilcox-O'Hearn <zooko at zooko.com> writes:
> Of course, this hardly matters anyway to my tracdarcs use case; what I really
> need for tracdarcs is for "darcs query contents" to return  the same answers in
> about 1/100 of the time it currently takes (i.e.  about 30 milliseconds would
> be an improvement), or perhaps to allow  queries on multiple files in a single
> call so that the tracdarcs  plugin doesn't need to invoke it dozens of times in
> order to render a  directory full of dozens of files.  See
> http://bugs.darcs.net/ issue1477 for details.

the performance issue you are talking about is hard: it will need the
filecache, and possibly some matching performance improvements: darcs has to
parse inventories to be able to match patches, and this comes at a non-trivial
cost for now.

However, if most of your queries are about HEAD (i.e. the current latest
revision), you may get your (almost) factor 100 improvement by just omitting
the "--match 'hash ...'" (in ghc-hashed):

darcs show contents --match  README  0,50s user 0,14s system 98% cpu 0,651 total
darcs show contents README  0,01s user 0,00s system 84% cpu 0,009 total

so in case you know your hash is the head hash, this should save you a *lot* of
processing time, and shouldn't complicate your tracdarcs code too much.

(And for 2.4, I could optimise the non-match code further I guess, since it's
currently quite inefficient, from a quick look at the code... The match code is
somewhat arcane, so I cannot promise much about it.)

Yours,
   Petr.