[darcs-users] GSoC: network optimisation vs cache vs library?

Lele Gaifax lele at nautilus.homeip.net
Thu Apr 15 08:24:50 UTC 2010


On Wed, 14 Apr 2010 20:18:21 -0400
Max Battcher <me at worldmaker.net> wrote:

> On 4/14/2010 19:23, Zooko Wilcox-O'Hearn wrote:
> > Our project web site was just down for about an hour and a half a
> > couple of hours ago. The reason turned out to be that there were
> > about a dozen darcs processes running trying to answer queries like
> > this:
> >
> > darcs query contents --quiet --match "hash
> > 20080103234853-92b7f-966e01e6a40dbe94209229f459988e9dea37013a.gz"
> > "docs/running.html"
> >
> > This is the query that the trac-darcs plugin issues when you hit
> > this web page:
> >
> > http://tahoe-lafs.org/trac/tahoe-lafs/changeset/1782/docs/running.html
>
> All of which goes to show that Trac+darcs still isn't well optimized
> for caching darcs queries or dealing gracefully with with long
> running command invocations...

As I'm working on forthcoming version 0.8, where I already gained a
good improvement in some poorly-written queries, I'm interested in
that. As a minimum, trac+darcs could avoid spawning multiple identical
processes that in any case all but one are going to be discarded.

> I still say the Trac reliance on
> CVS/SVN-style revision numbers means that Trac is absolutely not
> well-adapted for serving darcs repositories. It may be "revision
> 1782" to Trac, but 'show contents --match "hash 2008..."' is "commute
> this file to how it would appear if only the patches preceding or
> equal to this one with a timestamp from two years ago were applied"
> to darcs.

This is completely wrong: in any trac+darcs instance, there is a
one-to-one correspondence between the svn-like monothonically
incremental integer revision and the complete hash of the darcs
patch. Whether you ask for revision 1782 or "20081027HHMMSS...." you
are going to obtain the very same changeset, every time. This of
course assuming the underlying repository didn't change in ways other
than "darcs apply". If you really care about that, you can always use
the full hash when you reference a particular changeset, and simply
ignore the svn-like surrogate key, and your wiki/tickets TracLinks
will survive even a darcs1->darcs2 conversion.

The --match you quote above does not have anything to do with dates,
its just that the darcs hash begins with the "commit-time" date of the
patch.

> (Which ends up being quite possibly not a "real" historic
> version at all, and which does quite a bit of work to be so easily
> susceptible to crawlers/DDoS/accidental DDoS...)

I don't know what you mean with "real" historic version. In my
experience, the "trac+darcs" view always gave me the expected thing, I
mean, the one that corresponds both to my back memory of the change to
the practical sense of the changesets' neighbours.

The fact that darcs commutes so easily is a well known
pr^H^Hincredible feature we all plus or minus inconsciously love. We
discussed the matter, even David gave his opinion, wrt trac+darcs:
AFAICT, for a given single repository, not subject to "darcs optimize"
or other "back-history" reordering/change, the output of a "darcs
query content" or even "darcs diff" is the same. 

AFAIK, also Alberto's darcsweb uses a similar approach to examine
historical contents.

> 20secs doesn't sound unreasonable from the point of view that you are 
> asking darcs to create an entire new "version" of a file. While I
> expect there is plenty of performance left to squeeze from this, I
> don't think a query like this one will ever near git/svn/... historic
> revision lookup, because this is an entirely different beast. It
> doesn't make sense for me for Trac to rely on it for common queries.

It keeps a cache of that, so once it computed that particular
content at revision, its pretty fast.

> Forgive my petulance, but it seems to me fairly odd to me that for 
> someone working on a project for decentralized, scalable data storage 
> you seem fairly blind to web scalability issues when it comes to 
> Trac+Darcs...

Unpleasant words, I'd say. Anyway, while I'm sad if/that trac+darcs
draws such a bad light on darcs itself, I'd be interested in hearing
alternative/better approches. I'll keep improving trac+darcs as time
permits, because darcs is beautiful, and trac+darcs useful.

ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele at nautilus.homeip.net |                 -- Fortunato Depero, 1929.


More information about the darcs-users mailing list