[darcs-users] GSoC: network optimisation vs cache vs library?

Thu Apr 15 18:33:38 UTC 2010

Lele Gaifax wrote:
> On Wed, 14 Apr 2010 20:18:21 -0400
> Max Battcher <me at worldmaker.net> wrote:
> 
>> On 4/14/2010 19:23, Zooko Wilcox-O'Hearn wrote:
>>> Our project web site was just down for about an hour and a half a
>>> couple of hours ago. The reason turned out to be that there were
>>> about a dozen darcs processes running trying to answer queries like
>>> this:
>>>
>>> darcs query contents --quiet --match "hash
>>> 20080103234853-92b7f-966e01e6a40dbe94209229f459988e9dea37013a.gz"
>>> "docs/running.html"
>>>
>>> This is the query that the trac-darcs plugin issues when you hit
>>> this web page:
>>>
>>> http://tahoe-lafs.org/trac/tahoe-lafs/changeset/1782/docs/running.html
>> All of which goes to show that Trac+darcs still isn't well optimized
>> for caching darcs queries or dealing gracefully with with long
>> running command invocations...
> 
> As I'm working on forthcoming version 0.8, where I already gained a
> good improvement in some poorly-written queries, I'm interested in
> that. As a minimum, trac+darcs could avoid spawning multiple identical
> processes that in any case all but one are going to be discarded.

Well, my big suggestion remains to use a processing queue (even if its 
just to begin with Python's built-in queue module) and tuning the number 
of darcs subprocesses you spawn to best fit the server (you could even 
try to check CPU/memory utilization before subprocess spawning). Even 
without AJAX, if a subprocess takes, say, >50ms to clear from the queue, 
you can just present a message along the lines of "The cache is being 
populated, please try again in a few minutes." (Of course, a little AJAX 
to watch the progress of the operation would go a long way here.)

>> (Which ends up being quite possibly not a "real" historic
>> version at all, and which does quite a bit of work to be so easily
>> susceptible to crawlers/DDoS/accidental DDoS...)
> 
> I don't know what you mean with "real" historic version. In my
> experience, the "trac+darcs" view always gave me the expected thing, I
> mean, the one that corresponds both to my back memory of the change to
> the practical sense of the changesets' neighbours.
> 
> The fact that darcs commutes so easily is a well known
> pr^H^Hincredible feature we all plus or minus inconsciously love. We
> discussed the matter, even David gave his opinion, wrt trac+darcs:
> AFAICT, for a given single repository, not subject to "darcs optimize"
> or other "back-history" reordering/change, the output of a "darcs
> query content" or even "darcs diff" is the same. 
> 
> AFAIK, also Alberto's darcsweb uses a similar approach to examine
> historical contents.

So does darcsit.

Darcs' patch order is often a surprisingly good approximation for 
repository history, but it will never be a 1:1 equivalent (unlike the 
DAG-based DVCS' which maintain much more strict histories). I'm 
certainly not advocating that darcs become a DAG like git/hg. I'm just 
questioning if the different style of history deserves different 
approaches to history. Darcs patch history is certainly not equivalent 
to svn/git/hg revision history.

To my knowledge its certainly possible for darcs to end up commuting a 
new patch nearly anywhere in the repository's patch order, even on just 
a pull/apply. Usually its quite unlikely, but theoretically it is still 
possible for the merger of long divergent branches to result in unusual 
repository orders that don't necessarily reflect the history of either 
branch very well (other than obvious dependencies, of course)...

Even if Trac+Darcs revision numbers don't contort alongside patch order, 
its still possible for the ``darcs show contents --match "hash 
patch-hash"`` to produce subtly different results after only a 
pull/apply due to commutation. That is why I'm not certain that the 
state of a repository at every given patch is always necessarily 
meaningful. If its a close enough approximation to reality that you see 
fit to use it, by all means continue. I just can't help but wonder if 
there are more meaningful choices to be made, that need less caching 
overall...

--
--Max Battcher--
http://worldmaker.net