[darcs-users] Darcs and the HTTP library

Gwern Branwen gwern0 at gmail.com
Sun Dec 21 05:28:01 UTC 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

So, a while ago I mentioned that it'd be nice to scrap the
libwww/curl/other-web-binding bindings since they made the
configuration and building and documentation more complex, and led to
not a few end-user problems, and we had no control or influence with
them. The obvious alternative was Haskell's HTTP library, but aside
from issues with proxies and SSH, one major objection was performance.
As everyone knows, HTTP was [Char]-based and consequently horribly
slow and and memory-inefficient, and it would've hurt badly Darcs's
performance. Which really doesn't need any hurting.

So I'm happy to announce that raw performance, at least, no longer
seems to be a problem! Just a few days ago HTTP-4000.0.0 was released
to Hackage
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/HTTP and
its summary reads:

"A library for client-side HTTP, version 2. Rewrite of existing HTTP
package to allow overloaded representation of HTTP request bodies and
responses. Provides three such instances: lazy and strict ByteString,
along with the good old String. Inspired in part by Jonas Aadahl et
al's work on ByteString'ifying HTTP a couple of years ago. Git
repository available at http://code.galois.com/HTTPbis.git"

(It doesn't seem to've been announced, but dons mentioned it to me on Reddit.)

Naturally, one wonders how fast this rewrite is. Fortunately, the
homepage http://www.haskell.org/http/ provides an example 'get.hs'. I
installed the new HTTP, compiled get.hs with it, and ran a bulk
download with it:

gwern at craft:33333~>time wget -q
http://www.haskell.org/ghc/dist/current/dist/ghc-6.7.20070401-i386-unknown-linux.tar.bz2
&& time ./get
http://www.haskell.org/ghc/dist/current/dist/ghc-6.7.20070401-i386-unknown-linux.tar.bz2
> ghc.bz2 && diff ghc-6.7.20070401-i386-unknown-linux.tar.bz2 ghc.bz2
&& du -h ghc-6.7.20070401-i386-unknown-linux.tar.bz2 ghc.bz2 && rm
ghc-6.7.20070401-i386-unknown-linux.tar.bz2 ghc.bz2
=wget -q   0.06s user 0.43s system 2% cpu 23.032 total
./get  > ghc.bz2  3.10s user 0.67s system 15% cpu 24.518 total
22M	ghc-6.7.20070401-i386-unknown-linux.tar.bz2
22M	ghc.bz2

Note that in this iteration 'get' is only 1 second slower. The files
are identical, as diff reports, and they aren't empty files either,
they are the right size, as du reports. On some runs, get is faster
and on other runs, wget is faster. There seems to be a weak trend of
get being a few seconds slower. I will note in its defense that it's
not writing to a file like wget, but printing to stdout; this could be
slowing it down.

I don't recommend trying to switch to HTTP right now, because as I
said, I have no idea whether HTTP can handle Darcs's SSH and proxy
needs. But this is worth noting for the future.

- --
gwern
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEAREKAAYFAklN090ACgkQvpDo5Pfl1oIX6QCeK14nKb9yB4Rx2f5xx84JcxIw
YP4An3mi90h/fVizvjDHm00FDBVqj3Dg
=Ram2
-----END PGP SIGNATURE-----


More information about the darcs-users mailing list