[darcs-devel] announcing darcs 2.0.0pre1, the first prerelease for darcs 2

Simon Marlow simonmarhaskell at gmail.com
Fri Dec 14 13:33:33 UTC 2007


David Roundy wrote:
> On Fri, Dec 14, 2007 at 10:04:57AM +0000, Simon Marlow wrote:
>> David Roundy wrote:
>>> Okay, it turns out that it was indeed bad strictness causing the trouble.
>>> For some reason, I had made the PatchInfoAnd data type strict in both its
>>> components, which meant that every time we read a patch ID, we also needed
>>> to parse the patch itself.  Very foolish.  There may be some further
>>> regressions (I'm still running an optimize with profiling enabled.  But
>>> darcs changes --last 10 (with profiling running) now takes me just a bit
>>> over a minute, and not too much memory (I don't quite recall).
>> Ok, that is certainly an improvement:
>>
>>   $ time darcs2 cha --last=10
>>   ...
>>   60.60s real   59.83s user   0.21s system   99% darcs2 cha --last=10
>>
>> But this is still 1000 times slower than darcs1 for the same operation. 
>> Doesn't darcs changes just dump the contents of the inventory?
> 
> If you run darcs optimize first, this drops to 1s for me.  Still a bit
> slow, but not so bad (and that's most of why darcs1 is faster).

Ok, confirmed.

However, I never use optimize, and only use tag when I need to.  This is 
mainly because I'm paranoid and I don't fully understand what optimize 
does, and perhaps also because I'd like to understand what goes wrong if 
you don't use it.

I guess I don't understand why optimize is exposed to the user at all. if 
there's an optimal state for the repository, why can't it be maintained in 
that state?

> The problem is that --last isn't at all tuned for efficiency, and instead
> uses the same code that can handle --from-tag, and this could require
> reordering (--from-tag could), so there are O(N^2) operations going on,
> where N is the number of patches since the last known-to-be-in-order tag.
> 
> This has never been a problem (that I'm aware of), and simplifies the code
> since we only have to deal with one case.  Reusing the same code also
> ensures that performance improvements for one command are leveraged for
> other commands.  Which comes down to: I'd rather not optimize changes
> --last for the case of 17k patches and no tags (or not running optimize).
> But I could certainly be convinced, because we are indeed taking a very
> roundabout approach.  But then again, darcs1 uses exactly the same
> approach, so if we could gain another factor of ten without losing this
> abstraction, I'd rather know how--particularly as the improvement is likely
> to benefit all other darcs commands.

Sure, code re-use is definitely a good thing, and I agree that optimising 
this operation in ways that darcs1 does not would be premature, given that 
there is still a factor of 20 difference between darcs1 and darcs2 
unaccounted for.

Thanks for the quick response to my feedback so far... things are 
definitely heading in the right direction!

Cheers,
	Simon


More information about the darcs-devel mailing list