[darcs-devel] New performance regression?
Simon Marlow
simonmarhaskell at gmail.com
Mon Feb 11 12:33:14 UTC 2008
David Roundy wrote:
> On Thu, Feb 07, 2008 at 03:26:09PM +0000, Simon Marlow wrote:
>> Incedentally, I'm doing a get on this repository right now, and darcs2 has
>> been silent (no progress info at all) for a couple of minutes, even though
>> I can see it downloading stuff... oh, now it says "Identifying repository".
>
> This is rather a tricky scenario: the hashed_inventory of this repository
> takes 36s to download on my computer using wget. Our current assumption in
> the progress-reporting code is that any given download will be fast, and
> the result is that this particular operation is going to take a minimum of
> 36 seconds with no change in progress output. As it turns out, we download
> hashed_inventory twice when we only need download it once, so that slows
> things down dramatically.
>
> I can easily get rid of one of these downloads (which is just checking the
> format of the repository, something that can more easily be done with a
> check of _darcs/format, which always exists for hashed and darcs-2
> repositories). I'd actually also like to cache the contents of
> hashed_inventory, in a way that would enable us to guarantee that it's only
> downloaded once, which is good both for efficiency and for atomicity (to
> make sure we don't use two versions of the remote repository but assume
> they're the same).
>
> I'm testing (and planning to push) a patch now that'll cut 36 seconds on
> this get for me, by avoiding one download of hashed_inventory, but I don't
> see how to avoid a 36s wait with no new progress report without very
> dramatically rewriting our libwww/libcurl bindings.
Ok, sounds reasonable. But perhaps there should be a progress message
before the download starts, something like "getting
http://darcs.haskell.org/ghc-darcs2/_darcs/hashed_inventory..."?
Cheers,
Simon
More information about the darcs-devel
mailing list