[darcs-users] GSoC Project: Improving darcs' network performance

Dmitry Kurochkin dmitry.kurochkin at gmail.com
Fri Apr 16 11:57:43 UTC 2010

On Fri, Apr 16, 2010 at 4:45 AM, Dmitry Kurochkin
<dmitry.kurochkin at gmail.com> wrote:
> Hi all.
> On Fri, Apr 9, 2010 at 9:55 PM, Eric Kow <kowey at darcs.net> wrote:
>> On Fri, Apr 09, 2010 at 10:25:52 -0700, Jason Dagit wrote:
>>> >  * What happened to HTTP pipelining? Why wasn't that successful? Is
>>> > problem in darcs, or in libs, or somewhere else? (Or nobody knows?) It
>>> > would be worth to write about it, as it intended to solve the same
>>> > problem as my project.
>>> >
>>> Darcs supports HTTP pipelining.  If you're not seeing the pipelining
>>> happening, then I'm not sure what is wrong.
>> Perhaps a better question to ask is how this project relates to HTTP
>> pipelining.  Why wasn't Darcs's use of pipelining successful enough
>> that it would be a good idea to implement a smart server?
>> Some of our experiences with pipelining:
>>  - flakiness (cf. spate of bugs of curl_multi_perform bugs)
>>  - seemingly non-universal support (but then, how would a smart
>>   server help things?)
> I did some pipelining testing and have some results to share.
> 1. I was able to reproduce the "no running handles" bug reported by
> many people recently using HTTP server at localhost. Looks like
> curl_multi_perform() can complete download or fail with error on the
> first call and having zero running handles is fine. I will send a
> patch to remove this error completely and handle this case as
> expected. I can not promise this resolves the root cause for all
> related bugs, perhaps we would get other error messages. But one thing
> for sure, you will not see this error message ever again :) BTW it is
> not related to pipelining.
> 2. Pipelining works only if darcs makes several parallel requests. I
> added few debug prints to understand why no pipelining is done for
> http://darcs.net repo. copyFileUsingCache tries to download patches to
> /home/darcs-unstable/darcs/_darcs/pristine.hashed/ directory. There is
> no such directory on my system, so createDirectoryIfMissing fails. And
> speculateFileOrUrl is never called.
> 3. If I create this directory on my system, some pipelining starts to
> work. After several requests that reuse one connection, curl starts to
> open parallel connections to darcs.net. This seems a bug in curl to
> me, but it needs more investigation.

Turns out that when built with cabal -DCURL_PIPELINING flag is never
passed to C compiler. Fixed by patch208. Looks like in cabal builds
pipelining was always broken because of this.

Attached are two simple tests to measure the potential performance
gain curl pipelining can give. Tests just download http://darcs.net
page 500 and 1000 times. Note that network code is copied from
hscurl.c with no changes, so this is what darcs can expect if it
requests files smart enough.

./no_pipelining_test 500  0,12s user 0,14s system 0% cpu 2:03,20 total
./no_pipelining_test 1000  0,23s user 0,26s system 0% cpu 4:06,60 total

./pipelining_test 500  0,21s user 0,13s system 0% cpu 55,450 total
./pipelining_test 1000  0,67s user 0,19s system 0% cpu 1:51,94 total

More than 2x faster with pipelining. darcs.net index page size is 4004
bytes, I expect pipelining to show better results for smaller files.

To compile run:

> gcc pipelining_test.c -o pipelining_test -lcurl


> Later darcs hangs in waitUrl.
> This is fixed in another patch that I would send shortly.
> Summary: pipelining is tricky. There are problems on darcs side (both
> in network code and patch fetching code in general) and, perhaps, curl
> side. So it can be not trivial to investigate and fix. But I think
> this is one of the things that should be done to improve darcs network
> performance. AFAIK there were never a proper pipelining benchmarks
> done, but during my first tests with libwww and darcs-1 I got very
> nice results.
> I believe that pipelining combined with patch packs will resolve all
> darcs network performance problems. Smart server, IMHO, is not the way
> to go - too complex (from both user and development POV) for little
> benefit (compared to patch packs and proper pipelining).
> Regards,
>  Dmitry
>>> The benchmarks on the wiki are really hard to find because they seem to be
>>> strewn across many small pages, but a few links might get you started.
>>> Pages about benchmarks:
>>> http://wiki.darcs.net/Development/Benchmarks
>>> http://wiki.darcs.net/Benchmarks
>> The second page you linked <http://wiki.darcs.net/Benchmarks> is the
>> one-stop shop for benchmark results.  The first page is mostly for
>> developing the benchmark infrastructure.  Hmm, any ways we could improve
>> things so that the results are easier to find?
>>> And then there is a slew of individual runs on machines like this one:
>>> http://wiki.darcs.net/Benchmarks/Quasar
>> Yes, this is by design (partly to avoid having one giant page).
>> Note that each of the machine-specific pages is expected to follow
>> a general template, although there are bits and bobs scattered
>> throughout.
>> Now the bad news: we still don't have any benchmarking for the network
>> stuff, partly because we haven't yet wrapped our heads around how to
>> go about it in principle.
>>> >  * In description of 'darcs optimize --http' command, I'm saying that
>>> > patches before tag don't get reordered. Actually, I'm not 100% sure
>>> > about that. The idea I got while looking at sources is that tag is
>>> > regular patch (to some degree) that doesn't commute with anything.
>>> > However, I didn't find prove of it in sources, so I may be wrong. The
>>> > technical explanation of what's really going on with tags would be
>>> > really helpful.
>>> >
>>> I think for most cases this is accurate.  A tag depends on all the patches
>>> before it so most repository interactions stop at tags.  It's possible
>>> somethings need to look beyond tags, but I don't know the details.
>> We want to be careful here, since (as I understand it) a tag is just an
>> empty patch with explicit deps on a set of patches (including prior
>> tags).  So this means that you can commute things behind tags, and you
>> can have tags on top of patches...
>> Say you have (where T1 depends on TP1 TP2)
>>  TP1 TP2 T1 O3
>> While it's true that T1 will never get pushed past TP2 (and vice-versa),
>> you could always have situations that look like
>>  TP1' O3 TP2' T1
>> OK, so that stuff we all know, but surely we can still say useful things
>> about the sort of guarantees Darcs offers on how tagged stuff is stored.
>> --
>> Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
>> PGP Key ID: 08AC04F9
>> Version: GnuPG v1.4.9 (GNU/Linux)
>> iEYEARECAAYFAku/aiMACgkQBUrOwgisBPlSSwCfdjvcU+5xghU3OZ+qM7FLSuCz
>> BuAAoKrgbhIKGwR2whRkUF3EVnS9a1UF
>> =UdrY
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> darcs-users mailing list
>> darcs-users at darcs.net
>> http://lists.osuosl.org/mailman/listinfo/darcs-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: no_pipelining_test.c
Type: text/x-csrc
Size: 8776 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20100416/ff74bdb1/attachment-0002.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pipelining_test.c
Type: text/x-csrc
Size: 8938 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20100416/ff74bdb1/attachment-0003.c>

More information about the darcs-users mailing list