[darcs-users] state of HTTP pipelining in Darcs?

Dmitry Kurochkin dmitry.kurochkin at gmail.com
Thu Jun 24 14:21:46 UTC 2010


Hi Eric.

On Thu, Jun 24, 2010 at 5:33 PM, Eric Kow <kowey at darcs.net> wrote:
> Hi Dmitry,
>
> I just read a note from myself asking me to follow-up on the HTTP pipelining
> work.  Since we're coming up to a release, I thought I'd better do it now :-)
>
> Could I ask what is the state of HTTP pipelining in Darcs?  Are there any
> outstanding issues you think should still be fixed?
>

Pipelining itself should be in pretty good shape now. It is
autodetected and requires no special build options. And darcs get
works reliably for me now.

But I believe there is room for improvements in the patch fetching
(any file fetching, actually) code. If we know the list of files to
get from remote server, we should speculate these files and wait for
download to complete later. Downloading them one by one would not use
pipelining. I made these changes to fetch_patches_if_necessary
function. But I am sure there are more similar places.

> If I understand correctly, all your recent pipelining patches were applied:
>
> Sun Apr 18 23:10:13 BST 2010  Dmitry Kurochkin <dmitry.kurochkin at gmail.com>
>  * Simplify libcurl pipelining configuration.
>
> Fri Apr 16 12:21:05 BST 2010  Dmitry Kurochkin <dmitry.kurochkin at gmail.com>
>  * Pass -DCURL_PIPELINING to C compiler when HTTP pipelining is enabled.
>
> Sun Apr 18 16:03:02 BST 2010  Dmitry Kurochkin <dmitry.kurochkin at gmail.com>
>  * Darcs.Repository: use pipelining when copying patches.
>

There were other patches from me more or less related to pipelining.
All of them are applied.

> In 2010-04-16 (GSoC Project: Improving darcs' network performance),
> you said:
>
>> 1. I was able to reproduce the "no running handles" bug reported by
>> many people recently using HTTP server at localhost.
> [snip]
>
> That's http://bugs.darcs.net/issue1770 fixed by your
> 'Simplify libcurl pipelining configuration' patch
>

That one is fixed by "Fix hscurl.c when URL is downloaded during the
first call to curl_multi_perform." patch. As I said before, this error
message was removed. So if you get new reports about this error, just
ask to upgrade.

>> 2. Pipelining works only if darcs makes several parallel requests. I
>> added few debug prints to understand why no pipelining is done for
>> http://darcs.net repo. copyFileUsingCache tries to download patches to
>> /home/darcs-unstable/darcs/_darcs/pristine.hashed/ directory. There is
>> no such directory on my system, so createDirectoryIfMissing fails. And
>> speculateFileOrUrl is never called.
>
>> 3. If I create this directory on my system, some pipelining starts to
>> work. After several requests that reuse one connection, curl starts to
>> open parallel connections to darcs.net. This seems a bug in curl to
>> me, but it needs more investigation. Later darcs hangs in waitUrl.
>> This is fixed in another patch that I would send shortly.
>
> In a reply you added:
>> Turns out that when built with cabal -DCURL_PIPELINING flag is never
>> passed to C compiler. Fixed by patch208. Looks like in cabal builds
>> pipelining was always broken because of this.
>
> ...and patch268 was applied.  But does this address all of #2 and #3
> above or is there more work to do?
>

Yes.

#2 is fixed by "Resolve issue1159: smart caches union.". Not related
to pipelining itself. Non-existing entries in cache.

#3 is fixed in "Pass -DCURL_PIPELINING to C compiler when HTTP
pipelining is enabled." And "Simplify libcurl pipelining
configuration." should prevent similar bugs in the future.

>> Summary: pipelining is tricky. There are problems on darcs side (both
>> in network code and patch fetching code in general) and, perhaps, curl
>> side. So it can be not trivial to investigate and fix. But I think
>> this is one of the things that should be done to improve darcs network
>> performance. AFAIK there were never a proper pipelining benchmarks
>> done, but during my first tests with libwww and darcs-1 I got very
>> nice results.
>
> Maybe a summary of what work (if any) needs doing?

Changes similar to "Darcs.Repository: use pipelining when copying
patches." patch. As I said above, in all cases where darcs fetches a
list of known files, it should speculate on it first.

>  Also benchmarking the http
> stuff would be great.  We have this darcs-benchmark tool now... not sure if we
> can apply it effectively for this.
>

Networking benchmarks would be nice. But it can be not trivial, good
network benchmarks should not use external servers like darcs.net.

For me the fetch_patches_if_necessary changes resulted in darcs get
time for the http://darcs.net repository go down from 37:24 to 21:25.
See the patch description.

>> I believe that pipelining combined with patch packs will resolve all
>> darcs network performance problems. Smart server, IMHO, is not the way
>> to go - too complex (from both user and development POV) for little
>> benefit (compared to patch packs and proper pipelining).
>
> Oh, by the way... Alexey's packing work is going to be applied quite
> soon.  Exciting times!
>

Very nice indeed. IMHO creating such packs during tagging would pretty
much resolve slow fetch issues with no user-visible changes.

As I understand the code is already there and working. Did anyone
compared darcs get times?

Regards,
  Dmitry

> --
> Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
> PGP Key ID: 08AC04F9
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iEYEARECAAYFAkwjXsIACgkQBUrOwgisBPkpMwCgpoWVFc3RbIk9+WfgwNMgMrW1
> WxkAoOOvfc88tbDscm2QZ8/yCyoWD5Oa
> =OPSa
> -----END PGP SIGNATURE-----
>
>


More information about the darcs-users mailing list