[darcs-users] GSoC: network optimisation vs cache vs library?
Max Battcher
me at worldmaker.net
Fri Apr 16 06:02:11 UTC 2010
Alberto Bertogli wrote:
> It may be for very very large sites (like a darcs' equivalent of github), but
> not for people running darcsweb or trac (or whatever they may run) for
> personal or organizational use.
Scalability is for everyone. Multi-processing Queues are a simple
concept (your basic FIFO data structure) that any size application
can/should take advantage of, regardless of size, for "long-running"
background processes. ("long-running" in the web serving world being on
the order of half a second or less, even.) Python has a built-in queue
for the purpose, which is fine for small usage. A tiny bit more code and
you can easily add in support for bigger, more capable Queue systems
when a person or organization has them installed/available.
That's just one possible tool for the job...
> As I mentioned above, I don't see how this is a web scalability issue. A gui
> frontend would show the same behaviour, because it's the one displayed by
> darcs itself.
A GUI frontend would have progress indicators and probably also the
ability to cancel long-running tasks (just as darcs itself has progress
indicators and responds to Ctrl+C). It would be unlikely for a
well-built GUI to accidentally spawn too many darcs processes and
entirely disrupt a system. Whereas a web server might see random,
accidental or belligerent/disruptive surfing of a web server, and
accidentally spawn too many processes. (Which was the issue I was
primarily responding to: Trac+Darcs spawning a "large" number of
simultaneous darcs processes that ate up all the CPU/memory of the web
server.)
> Zooko's post was, as I understand it, about the fact that his team _does_ want
> this operation to work well, and I think that's interesting and valuable
> information to have in mind when deciding what to optimize first.
I'm certainly for performance enhancements and using real-world use
cases to prioritize them. However, in this particular case there are
"obvious" (to me) remedies that Zooko and his team could look into to
get more immediate results, which is what I was trying to get across. I
hope I made that clearer in my response to Lele. To summarize/reiterate:
Trac+Darcs already tries to avoid the long darcs process calls that
Zooko pointed out by caching them in the SQL database at hand. However,
when the cache isn't fully populated, it doesn't provide any means to
notify a user that the process may take a while, or to prevent too many
requests happening simultaneously to populate the cache. I suggested a
Queue and tuning the application to only allow so many simultaneous
calls to long-running darcs processes. (There's also the simple idea of
providing a command to fill the entire cache ahead of time, which can be
called during useful downtime... and which is even easier to write a
nice multi-threaded version when you've got the queuing infrastructure
in place.)
--
--Max Battcher--
http://worldmaker.net
More information about the darcs-users
mailing list