[darcs-devel] [patch1375] removed special handling of --to-match from cloneRepos...

Fri Oct 16 19:14:15 UTC 2015

Guillaume Hoffmann <guillaumh at gmail.com> added the comment:

# Overview of current code and one benchmark

When cloning without the optimization, we:

* copy the pristine tree
* copy the patches (allowing CTRL+C)
* unapply unneeded patches (if some matcher is given)
  to go back to the version we want.

When cloning with the optimization, we:

* apply the patches to the empty state until reaching the state we want,
   getting needed patches on demand
  (this is the ame as old-fashioned repository cloning!)

Now, let's see the effect of the optimization on one real world repo.

Cloning darcsden's repo (1100 patches) and matching on a
patch from the middle of the history (patch number ~700):

    time darcs clone http://hub.darcs.net/simon/darcsden  --no-cache
--to-match "hash 31c8ee8"

With the optimization, cloning takes 20s, without it takes 50s!

(This reminds me of the benchmarks we ran on cloning with packs.)

Even when matching on the last patch, the optimization wins:

    time darcs clone http://hub.darcs.net/simon/darcsden  --no-cache
--to-match "hash ebbeb01"

with: 30s without: 50s!

# Decision

As I said in my previous comment, repo cloning code is awful and
complicated. So I'm accepting Ben's patch anyway but not without
some discussion/roadmap for the cloning code.

# Proposal

I think repo cloning code should be broken in two cases named after
*what they do*:

(A) get pristine and possibly unapply patches:

    For the default case (lazy or not)

(B) get patches to apply to an empty state:

    For old fashioned cloning
    For to-match, can also apply to --tag (currently not the case).

As long as we want to support OF repositories we will have code for
(B), so it's probably worth it to rewrite it into a good abstraction
that would work for all the (B) cases.

I'd suggest:

* reimplement cloning code into two cases (A) and (B) described above
* also end with this nonsense of repeating "identifyRepository" several
  times during cloning

What bothers me is that you could always create repositories with "hard"
patches at the beginning and "easy" patches at the end, or the contrary,
which can make (A) or (B) more interesting than the other one.
So there is no "always better" solution.

Maybe we should provide good enough defaults and a manual switch?

Like:

* when cloning without matcher -> A by default
* when cloning with matcher (including --tag)  or OF repo-> B by default

An then (one further step, needs discussion):

* if user provides --pristine-first / --patches-first flag, default
  behaviour is overrided?

And then (maybe this is too crazy, but we kind of do this already with
packs):

* run A and B in parallel, keep the one that finishes first? :-]

__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/patch1375>
__________________________________