[darcs-users] GSoC Project: Improving darcs' network performance
dmitry.kurochkin at gmail.com
Fri Apr 16 00:45:17 UTC 2010
On Fri, Apr 9, 2010 at 9:55 PM, Eric Kow <kowey at darcs.net> wrote:
> On Fri, Apr 09, 2010 at 10:25:52 -0700, Jason Dagit wrote:
>> > * What happened to HTTP pipelining? Why wasn't that successful? Is
>> > problem in darcs, or in libs, or somewhere else? (Or nobody knows?) It
>> > would be worth to write about it, as it intended to solve the same
>> > problem as my project.
>> Darcs supports HTTP pipelining. If you're not seeing the pipelining
>> happening, then I'm not sure what is wrong.
> Perhaps a better question to ask is how this project relates to HTTP
> pipelining. Why wasn't Darcs's use of pipelining successful enough
> that it would be a good idea to implement a smart server?
> Some of our experiences with pipelining:
> - flakiness (cf. spate of bugs of curl_multi_perform bugs)
> - seemingly non-universal support (but then, how would a smart
> server help things?)
I did some pipelining testing and have some results to share.
1. I was able to reproduce the "no running handles" bug reported by
many people recently using HTTP server at localhost. Looks like
curl_multi_perform() can complete download or fail with error on the
first call and having zero running handles is fine. I will send a
patch to remove this error completely and handle this case as
expected. I can not promise this resolves the root cause for all
related bugs, perhaps we would get other error messages. But one thing
for sure, you will not see this error message ever again :) BTW it is
not related to pipelining.
2. Pipelining works only if darcs makes several parallel requests. I
added few debug prints to understand why no pipelining is done for
http://darcs.net repo. copyFileUsingCache tries to download patches to
/home/darcs-unstable/darcs/_darcs/pristine.hashed/ directory. There is
no such directory on my system, so createDirectoryIfMissing fails. And
speculateFileOrUrl is never called.
3. If I create this directory on my system, some pipelining starts to
work. After several requests that reuse one connection, curl starts to
open parallel connections to darcs.net. This seems a bug in curl to
me, but it needs more investigation. Later darcs hangs in waitUrl.
This is fixed in another patch that I would send shortly.
Summary: pipelining is tricky. There are problems on darcs side (both
in network code and patch fetching code in general) and, perhaps, curl
side. So it can be not trivial to investigate and fix. But I think
this is one of the things that should be done to improve darcs network
performance. AFAIK there were never a proper pipelining benchmarks
done, but during my first tests with libwww and darcs-1 I got very
I believe that pipelining combined with patch packs will resolve all
darcs network performance problems. Smart server, IMHO, is not the way
to go - too complex (from both user and development POV) for little
benefit (compared to patch packs and proper pipelining).
>> The benchmarks on the wiki are really hard to find because they seem to be
>> strewn across many small pages, but a few links might get you started.
>> Pages about benchmarks:
> The second page you linked <http://wiki.darcs.net/Benchmarks> is the
> one-stop shop for benchmark results. The first page is mostly for
> developing the benchmark infrastructure. Hmm, any ways we could improve
> things so that the results are easier to find?
>> And then there is a slew of individual runs on machines like this one:
> Yes, this is by design (partly to avoid having one giant page).
> Note that each of the machine-specific pages is expected to follow
> a general template, although there are bits and bobs scattered
> Now the bad news: we still don't have any benchmarking for the network
> stuff, partly because we haven't yet wrapped our heads around how to
> go about it in principle.
>> > * In description of 'darcs optimize --http' command, I'm saying that
>> > patches before tag don't get reordered. Actually, I'm not 100% sure
>> > about that. The idea I got while looking at sources is that tag is
>> > regular patch (to some degree) that doesn't commute with anything.
>> > However, I didn't find prove of it in sources, so I may be wrong. The
>> > technical explanation of what's really going on with tags would be
>> > really helpful.
>> I think for most cases this is accurate. A tag depends on all the patches
>> before it so most repository interactions stop at tags. It's possible
>> somethings need to look beyond tags, but I don't know the details.
> We want to be careful here, since (as I understand it) a tag is just an
> empty patch with explicit deps on a set of patches (including prior
> tags). So this means that you can commute things behind tags, and you
> can have tags on top of patches...
> Say you have (where T1 depends on TP1 TP2)
> TP1 TP2 T1 O3
> While it's true that T1 will never get pushed past TP2 (and vice-versa),
> you could always have situations that look like
> TP1' O3 TP2' T1
> OK, so that stuff we all know, but surely we can still say useful things
> about the sort of guarantees Darcs offers on how tagged stuff is stored.
> Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
> PGP Key ID: 08AC04F9
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> -----END PGP SIGNATURE-----
> darcs-users mailing list
> darcs-users at darcs.net
More information about the darcs-users