[darcs-users] GSoC: network optimisation vs cache vs library?

Max Battcher me at worldmaker.net
Fri Apr 16 06:02:11 UTC 2010


Alberto Bertogli wrote:
> It may be for very very large sites (like a darcs' equivalent of github), but
> not for people running darcsweb or trac (or whatever they may run) for
> personal or organizational use.

Scalability is for everyone. Multi-processing Queues are a simple 
concept (your basic FIFO data structure) that any size application 
can/should take advantage of, regardless of size, for "long-running" 
background processes. ("long-running" in the web serving world being on 
the order of half a second or less, even.) Python has a built-in queue 
for the purpose, which is fine for small usage. A tiny bit more code and 
you can easily add in support for bigger, more capable Queue systems 
when a person or organization has them installed/available.

That's just one possible tool for the job...

> As I mentioned above, I don't see how this is a web scalability issue. A gui
> frontend would show the same behaviour, because it's the one displayed by
> darcs itself.

A GUI frontend would have progress indicators and probably also the 
ability to cancel long-running tasks (just as darcs itself has progress 
indicators and responds to Ctrl+C). It would be unlikely for a 
well-built GUI to accidentally spawn too many darcs processes and 
entirely disrupt a system. Whereas a web server might see random, 
accidental or belligerent/disruptive surfing of a web server, and 
accidentally spawn too many processes. (Which was the issue I was 
primarily responding to: Trac+Darcs spawning a "large" number of 
simultaneous darcs processes that ate up all the CPU/memory of the web 
server.)

> Zooko's post was, as I understand it, about the fact that his team _does_ want
> this operation to work well, and I think that's interesting and valuable
> information to have in mind when deciding what to optimize first.

I'm certainly for performance enhancements and using real-world use 
cases to prioritize them. However, in this particular case there are 
"obvious" (to me) remedies that Zooko and his team could look into to 
get more immediate results, which is what I was trying to get across. I 
hope I made that clearer in my response to Lele. To summarize/reiterate:

Trac+Darcs already tries to avoid the long darcs process calls that 
Zooko pointed out by caching them in the SQL database at hand. However, 
when the cache isn't fully populated, it doesn't provide any means to 
notify a user that the process may take a while, or to prevent too many 
requests happening simultaneously to populate the cache. I suggested a 
Queue and tuning the application to only allow so many simultaneous 
calls to long-running darcs processes. (There's also the simple idea of 
providing a command to fill the entire cache ahead of time, which can be 
called during useful downtime... and which is even easier to write a 
nice multi-threaded version when you've got the queuing infrastructure 
in place.)

--
--Max Battcher--
http://worldmaker.net


More information about the darcs-users mailing list