[darcs-users] GSoC Project: Improving darcs' network performance

Dmitry Kurochkin dmitry.kurochkin at gmail.com
Fri Apr 16 12:07:38 UTC 2010


On Fri, Apr 16, 2010 at 3:57 PM, Dmitry Kurochkin
<dmitry.kurochkin at gmail.com> wrote:
> On Fri, Apr 16, 2010 at 4:45 AM, Dmitry Kurochkin
> <dmitry.kurochkin at gmail.com> wrote:
>> Hi all.
>>
>> On Fri, Apr 9, 2010 at 9:55 PM, Eric Kow <kowey at darcs.net> wrote:
>>> On Fri, Apr 09, 2010 at 10:25:52 -0700, Jason Dagit wrote:
>>>> >  * What happened to HTTP pipelining? Why wasn't that successful? Is
>>>> > problem in darcs, or in libs, or somewhere else? (Or nobody knows?) It
>>>> > would be worth to write about it, as it intended to solve the same
>>>> > problem as my project.
>>>> >
>>>>
>>>> Darcs supports HTTP pipelining.  If you're not seeing the pipelining
>>>> happening, then I'm not sure what is wrong.
>>>
>>> Perhaps a better question to ask is how this project relates to HTTP
>>> pipelining.  Why wasn't Darcs's use of pipelining successful enough
>>> that it would be a good idea to implement a smart server?
>>>
>>> Some of our experiences with pipelining:
>>>
>>>  - flakiness (cf. spate of bugs of curl_multi_perform bugs)
>>>  - seemingly non-universal support (but then, how would a smart
>>>   server help things?)
>>>
>>
>> I did some pipelining testing and have some results to share.
>>
>> 1. I was able to reproduce the "no running handles" bug reported by
>> many people recently using HTTP server at localhost. Looks like
>> curl_multi_perform() can complete download or fail with error on the
>> first call and having zero running handles is fine. I will send a
>> patch to remove this error completely and handle this case as
>> expected. I can not promise this resolves the root cause for all
>> related bugs, perhaps we would get other error messages. But one thing
>> for sure, you will not see this error message ever again :) BTW it is
>> not related to pipelining.
>>
>> 2. Pipelining works only if darcs makes several parallel requests. I
>> added few debug prints to understand why no pipelining is done for
>> http://darcs.net repo. copyFileUsingCache tries to download patches to
>> /home/darcs-unstable/darcs/_darcs/pristine.hashed/ directory. There is
>> no such directory on my system, so createDirectoryIfMissing fails. And
>> speculateFileOrUrl is never called.
>>
>> 3. If I create this directory on my system, some pipelining starts to
>> work. After several requests that reuse one connection, curl starts to
>> open parallel connections to darcs.net. This seems a bug in curl to
>> me, but it needs more investigation.
>
> Turns out that when built with cabal -DCURL_PIPELINING flag is never
> passed to C compiler. Fixed by patch208. Looks like in cabal builds
> pipelining was always broken because of this.
>
> Attached are two simple tests to measure the potential performance
> gain curl pipelining can give.

Sorry, I have sent wrong version of tests, that set CURLOPT_NOBODY option.

Attached are proper tests.

Regards,
  Dmitry

> Tests just download http://darcs.net
> page 500 and 1000 times. Note that network code is copied from
> hscurl.c with no changes, so this is what darcs can expect if it
> requests files smart enough.
>
> ./no_pipelining_test 500  0,12s user 0,14s system 0% cpu 2:03,20 total
> ./no_pipelining_test 1000  0,23s user 0,26s system 0% cpu 4:06,60 total
>
> ./pipelining_test 500  0,21s user 0,13s system 0% cpu 55,450 total
> ./pipelining_test 1000  0,67s user 0,19s system 0% cpu 1:51,94 total
>
> More than 2x faster with pipelining. darcs.net index page size is 4004
> bytes, I expect pipelining to show better results for smaller files.
>
> To compile run:
>
>> gcc pipelining_test.c -o pipelining_test -lcurl
>
> Regards,
>  Dmitry
>
>> Later darcs hangs in waitUrl.
>> This is fixed in another patch that I would send shortly.
>>
>> Summary: pipelining is tricky. There are problems on darcs side (both
>> in network code and patch fetching code in general) and, perhaps, curl
>> side. So it can be not trivial to investigate and fix. But I think
>> this is one of the things that should be done to improve darcs network
>> performance. AFAIK there were never a proper pipelining benchmarks
>> done, but during my first tests with libwww and darcs-1 I got very
>> nice results.
>>
>> I believe that pipelining combined with patch packs will resolve all
>> darcs network performance problems. Smart server, IMHO, is not the way
>> to go - too complex (from both user and development POV) for little
>> benefit (compared to patch packs and proper pipelining).
>>
>> Regards,
>>  Dmitry
>>
>>>> The benchmarks on the wiki are really hard to find because they seem to be
>>>> strewn across many small pages, but a few links might get you started.
>>>>
>>>> Pages about benchmarks:
>>>> http://wiki.darcs.net/Development/Benchmarks
>>>> http://wiki.darcs.net/Benchmarks
>>>
>>> The second page you linked <http://wiki.darcs.net/Benchmarks> is the
>>> one-stop shop for benchmark results.  The first page is mostly for
>>> developing the benchmark infrastructure.  Hmm, any ways we could improve
>>> things so that the results are easier to find?
>>>
>>>> And then there is a slew of individual runs on machines like this one:
>>>> http://wiki.darcs.net/Benchmarks/Quasar
>>>
>>> Yes, this is by design (partly to avoid having one giant page).
>>>
>>> Note that each of the machine-specific pages is expected to follow
>>> a general template, although there are bits and bobs scattered
>>> throughout.
>>>
>>> Now the bad news: we still don't have any benchmarking for the network
>>> stuff, partly because we haven't yet wrapped our heads around how to
>>> go about it in principle.
>>>
>>>> >  * In description of 'darcs optimize --http' command, I'm saying that
>>>> > patches before tag don't get reordered. Actually, I'm not 100% sure
>>>> > about that. The idea I got while looking at sources is that tag is
>>>> > regular patch (to some degree) that doesn't commute with anything.
>>>> > However, I didn't find prove of it in sources, so I may be wrong. The
>>>> > technical explanation of what's really going on with tags would be
>>>> > really helpful.
>>>> >
>>>>
>>>> I think for most cases this is accurate.  A tag depends on all the patches
>>>> before it so most repository interactions stop at tags.  It's possible
>>>> somethings need to look beyond tags, but I don't know the details.
>>>
>>> We want to be careful here, since (as I understand it) a tag is just an
>>> empty patch with explicit deps on a set of patches (including prior
>>> tags).  So this means that you can commute things behind tags, and you
>>> can have tags on top of patches...
>>>
>>> Say you have (where T1 depends on TP1 TP2)
>>>
>>>  TP1 TP2 T1 O3
>>>
>>> While it's true that T1 will never get pushed past TP2 (and vice-versa),
>>> you could always have situations that look like
>>>
>>>  TP1' O3 TP2' T1
>>>
>>> OK, so that stuff we all know, but surely we can still say useful things
>>> about the sort of guarantees Darcs offers on how tagged stuff is stored.
>>>
>>> --
>>> Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
>>> PGP Key ID: 08AC04F9
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.9 (GNU/Linux)
>>>
>>> iEYEARECAAYFAku/aiMACgkQBUrOwgisBPlSSwCfdjvcU+5xghU3OZ+qM7FLSuCz
>>> BuAAoKrgbhIKGwR2whRkUF3EVnS9a1UF
>>> =UdrY
>>> -----END PGP SIGNATURE-----
>>>
>>> _______________________________________________
>>> darcs-users mailing list
>>> darcs-users at darcs.net
>>> http://lists.osuosl.org/mailman/listinfo/darcs-users
>>>
>>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pipelining_test.c
Type: text/x-csrc
Size: 8893 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20100416/4daea668/attachment-0002.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: no_pipelining_test.c
Type: text/x-csrc
Size: 8731 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20100416/4daea668/attachment-0003.c>


More information about the darcs-users mailing list