[darcs-users] GSoC: network optimisation vs cache vs library?

Max Battcher me at worldmaker.net
Thu Apr 15 00:18:21 UTC 2010


On 4/14/2010 19:23, Zooko Wilcox-O'Hearn wrote:
> Our project web site was just down for about an hour and a half a couple
> of hours ago. The reason turned out to be that there were about a dozen
> darcs processes running trying to answer queries like this:
>
> darcs query contents --quiet --match "hash
> 20080103234853-92b7f-966e01e6a40dbe94209229f459988e9dea37013a.gz"
> "docs/running.html"
>
> This is the query that the trac-darcs plugin issues when you hit this
> web page:
>
> http://tahoe-lafs.org/trac/tahoe-lafs/changeset/1782/docs/running.html
>
> That particular query when run in isolation (i.e. not concurrently with
> dozens of other queries) takes at least 20 seconds, and about 59 MB of RAM.
>
> Enough of these outstanding queries had piled up that the server ran out
> of RAM and stopped serving our trac instance or allowing ssh access for
> about an hour and a half.

All of which goes to show that Trac+darcs still isn't well optimized for 
caching darcs queries or dealing gracefully with with long running 
command invocations... I still say the Trac reliance on CVS/SVN-style 
revision numbers means that Trac is absolutely not well-adapted for 
serving darcs repositories. It may be "revision 1782" to Trac, but 'show 
contents --match "hash 2008..."' is "commute this file to how it would 
appear if only the patches preceding or equal to this one with a 
timestamp from two years ago were applied" to darcs. (Which ends up 
being quite possibly not a "real" historic version at all, and which 
does quite a bit of work to be so easily susceptible to 
crawlers/DDoS/accidental DDoS...)

20secs doesn't sound unreasonable from the point of view that you are 
asking darcs to create an entire new "version" of a file. While I expect 
there is plenty of performance left to squeeze from this, I don't think 
a query like this one will ever near git/svn/... historic revision 
lookup, because this is an entirely different beast. It doesn't make 
sense for me for Trac to rely on it for common queries.

Maybe you should sponsor someone to work on "web scalability" for you. 
For instance, a bit of AJAXy "long-running process" support ("Please 
wait while this ahistoric version is fetched...") and a basic task queue 
(RabbitMQ, Amazon SQS, whatever) to keep the server from biting off more 
than it can chew at any given point... (Or even spreading about the 
cache generation misery to more than one server. Queues are very useful 
that way.)

Forgive my petulance, but it seems to me fairly odd to me that for 
someone working on a project for decentralized, scalable data storage 
you seem fairly blind to web scalability issues when it comes to 
Trac+Darcs...

--
--Max Battcher--
http://worldmaker.net


More information about the darcs-users mailing list