[darcs-devel] announcing darcs 2.0.0pre1, the first prerelease for darcs 2

David Roundy droundy at darcs.net
Fri Dec 14 15:52:01 UTC 2007


On Fri, Dec 14, 2007 at 01:33:33PM +0000, Simon Marlow wrote:
> David Roundy wrote:
> > On Fri, Dec 14, 2007 at 10:04:57AM +0000, Simon Marlow wrote:
> >> David Roundy wrote:
> >>> Okay, it turns out that it was indeed bad strictness causing the trouble.
> >>> For some reason, I had made the PatchInfoAnd data type strict in both its
> >>> components, which meant that every time we read a patch ID, we also needed
> >>> to parse the patch itself.  Very foolish.  There may be some further
> >>> regressions (I'm still running an optimize with profiling enabled.  But
> >>> darcs changes --last 10 (with profiling running) now takes me just a bit
> >>> over a minute, and not too much memory (I don't quite recall).
> >> Ok, that is certainly an improvement:
> >>
> >>   $ time darcs2 cha --last=10
> >>   ...
> >>   60.60s real   59.83s user   0.21s system   99% darcs2 cha --last=10
> >>
> >> But this is still 1000 times slower than darcs1 for the same operation. 
> >> Doesn't darcs changes just dump the contents of the inventory?
> > 
> > If you run darcs optimize first, this drops to 1s for me.  Still a bit
> > slow, but not so bad (and that's most of why darcs1 is faster).
> 
> Ok, confirmed.
> 
> However, I never use optimize, and only use tag when I need to.  This is 
> mainly because I'm paranoid and I don't fully understand what optimize 
> does, and perhaps also because I'd like to understand what goes wrong if 
> you don't use it.

That's reasonable, I suppose.  And it'd also make sense for convert to
generate an optimized repository--I wrote convert in the simplest possible
manner, figuring that the one thing that mattered was that it was bug-free.
Throwing an optimizeInventory onto the end of it would definitely make
things look better.  On the other hand, it would also just hide the fact
that folks who never tag will get bad performance.

If you give it no extra flags optimize just splits up the inventory at the
most recent tag that is in-order (meaning all patches prior to that tag are
in the tag).  This has a couple of benefits.  One is that someone pulling
from the repository doesn't have to download a list of all the patches in
the repository, even the ones he/she already has.  But generically, it
allows us to optimize any comparison of two repositories, since once we
know that they both have 

> I guess I don't understand why optimize is exposed to the user at all. if 
> there's an optimal state for the repository, why can't it be maintained in 
> that state?

It's because it could cost O(N*T) (where N is the number of patches since
the last already-identified in-order-tag repository and T is the number of
out-of-order tags in this set) to find out if there is a more optimal state
than the current state.  We *could* make every darcs operation O(N) in the
hope of making N smaller (where many of them are now O(1)), but that
doesn't seem like a good direction to go.  On the other hand, maybe the
additional simplicity would be worth the performance penalty.  Perhaps we
should optimize whenever we perform an O(N) command.  As it is, this
optimization is only performed when actually creating a tag.

In many use scenarios optimize --reorder is a major gain, but it has a
user-visible affect, it reorders the patches in the repository, affecting
for instance the order of the output of darcs changes.  On the other hand,
it might be easy to tell users: when you pull from a repository, patches in
your repository are shuffled around to match the order of the one you're
pulling from.  In the old format optimize --reorder was "dangerous"
(e.g. if someone pulled from you while you were doing this, they could get
a corrupt repository), but that's no longer the case, so we could be a bit
more agressive in automatically optimizing the repository, which would be
nice.

> > The problem is that --last isn't at all tuned for efficiency, and instead
> > uses the same code that can handle --from-tag, and this could require
> > reordering (--from-tag could), so there are O(N^2) operations going on,
> > where N is the number of patches since the last known-to-be-in-order tag.
> > 
> > This has never been a problem (that I'm aware of), and simplifies the code
> > since we only have to deal with one case.  Reusing the same code also
> > ensures that performance improvements for one command are leveraged for
> > other commands.  Which comes down to: I'd rather not optimize changes
> > --last for the case of 17k patches and no tags (or not running optimize).
> > But I could certainly be convinced, because we are indeed taking a very
> > roundabout approach.  But then again, darcs1 uses exactly the same
> > approach, so if we could gain another factor of ten without losing this
> > abstraction, I'd rather know how--particularly as the improvement is likely
> > to benefit all other darcs commands.
> 
> Sure, code re-use is definitely a good thing, and I agree that optimising 
> this operation in ways that darcs1 does not would be premature, given that 
> there is still a factor of 20 difference between darcs1 and darcs2 
> unaccounted for.
> 
> Thanks for the quick response to my feedback so far... things are 
> definitely heading in the right direction!

You're welcome, thanks for all the feedback!

And I've fixed the failing test and am pushing my latest right now--give it
an hour to get to the hashed darcs unstable repository and maybe another
hour to migrate to the old-fashioned one.  (Unless there's yet another
test failure... which is the whole point of the delay.)
-- 
David Roundy
Department of Physics
Oregon State University


More information about the darcs-devel mailing list