[darcs-users] GSoC: network optimisation vs cache vs library?
me at worldmaker.net
Thu Apr 15 00:18:21 UTC 2010
On 4/14/2010 19:23, Zooko Wilcox-O'Hearn wrote:
> Our project web site was just down for about an hour and a half a couple
> of hours ago. The reason turned out to be that there were about a dozen
> darcs processes running trying to answer queries like this:
> darcs query contents --quiet --match "hash
> This is the query that the trac-darcs plugin issues when you hit this
> web page:
> That particular query when run in isolation (i.e. not concurrently with
> dozens of other queries) takes at least 20 seconds, and about 59 MB of RAM.
> Enough of these outstanding queries had piled up that the server ran out
> of RAM and stopped serving our trac instance or allowing ssh access for
> about an hour and a half.
All of which goes to show that Trac+darcs still isn't well optimized for
caching darcs queries or dealing gracefully with with long running
command invocations... I still say the Trac reliance on CVS/SVN-style
revision numbers means that Trac is absolutely not well-adapted for
serving darcs repositories. It may be "revision 1782" to Trac, but 'show
contents --match "hash 2008..."' is "commute this file to how it would
appear if only the patches preceding or equal to this one with a
timestamp from two years ago were applied" to darcs. (Which ends up
being quite possibly not a "real" historic version at all, and which
does quite a bit of work to be so easily susceptible to
20secs doesn't sound unreasonable from the point of view that you are
asking darcs to create an entire new "version" of a file. While I expect
there is plenty of performance left to squeeze from this, I don't think
a query like this one will ever near git/svn/... historic revision
lookup, because this is an entirely different beast. It doesn't make
sense for me for Trac to rely on it for common queries.
Maybe you should sponsor someone to work on "web scalability" for you.
For instance, a bit of AJAXy "long-running process" support ("Please
wait while this ahistoric version is fetched...") and a basic task queue
(RabbitMQ, Amazon SQS, whatever) to keep the server from biting off more
than it can chew at any given point... (Or even spreading about the
cache generation misery to more than one server. Queues are very useful
Forgive my petulance, but it seems to me fairly odd to me that for
someone working on a project for decentralized, scalable data storage
you seem fairly blind to web scalability issues when it comes to
More information about the darcs-users