[darcs-users] GSoC: network optimisation vs cache vs library?
me at worldmaker.net
Thu Apr 15 18:33:38 UTC 2010
Lele Gaifax wrote:
> On Wed, 14 Apr 2010 20:18:21 -0400
> Max Battcher <me at worldmaker.net> wrote:
>> On 4/14/2010 19:23, Zooko Wilcox-O'Hearn wrote:
>>> Our project web site was just down for about an hour and a half a
>>> couple of hours ago. The reason turned out to be that there were
>>> about a dozen darcs processes running trying to answer queries like
>>> darcs query contents --quiet --match "hash
>>> This is the query that the trac-darcs plugin issues when you hit
>>> this web page:
>> All of which goes to show that Trac+darcs still isn't well optimized
>> for caching darcs queries or dealing gracefully with with long
>> running command invocations...
> As I'm working on forthcoming version 0.8, where I already gained a
> good improvement in some poorly-written queries, I'm interested in
> that. As a minimum, trac+darcs could avoid spawning multiple identical
> processes that in any case all but one are going to be discarded.
Well, my big suggestion remains to use a processing queue (even if its
just to begin with Python's built-in queue module) and tuning the number
of darcs subprocesses you spawn to best fit the server (you could even
try to check CPU/memory utilization before subprocess spawning). Even
without AJAX, if a subprocess takes, say, >50ms to clear from the queue,
you can just present a message along the lines of "The cache is being
populated, please try again in a few minutes." (Of course, a little AJAX
to watch the progress of the operation would go a long way here.)
>> (Which ends up being quite possibly not a "real" historic
>> version at all, and which does quite a bit of work to be so easily
>> susceptible to crawlers/DDoS/accidental DDoS...)
> I don't know what you mean with "real" historic version. In my
> experience, the "trac+darcs" view always gave me the expected thing, I
> mean, the one that corresponds both to my back memory of the change to
> the practical sense of the changesets' neighbours.
> The fact that darcs commutes so easily is a well known
> pr^H^Hincredible feature we all plus or minus inconsciously love. We
> discussed the matter, even David gave his opinion, wrt trac+darcs:
> AFAICT, for a given single repository, not subject to "darcs optimize"
> or other "back-history" reordering/change, the output of a "darcs
> query content" or even "darcs diff" is the same.
> AFAIK, also Alberto's darcsweb uses a similar approach to examine
> historical contents.
So does darcsit.
Darcs' patch order is often a surprisingly good approximation for
repository history, but it will never be a 1:1 equivalent (unlike the
DAG-based DVCS' which maintain much more strict histories). I'm
certainly not advocating that darcs become a DAG like git/hg. I'm just
questioning if the different style of history deserves different
approaches to history. Darcs patch history is certainly not equivalent
to svn/git/hg revision history.
To my knowledge its certainly possible for darcs to end up commuting a
new patch nearly anywhere in the repository's patch order, even on just
a pull/apply. Usually its quite unlikely, but theoretically it is still
possible for the merger of long divergent branches to result in unusual
repository orders that don't necessarily reflect the history of either
branch very well (other than obvious dependencies, of course)...
Even if Trac+Darcs revision numbers don't contort alongside patch order,
its still possible for the ``darcs show contents --match "hash
patch-hash"`` to produce subtly different results after only a
pull/apply due to commutation. That is why I'm not certain that the
state of a repository at every given patch is always necessarily
meaningful. If its a close enough approximation to reality that you see
fit to use it, by all means continue. I just can't help but wonder if
there are more meaningful choices to be made, that need less caching
More information about the darcs-users