[darcs-devel] announcing darcs 2.0.0pre1, the first prerelease for darcs 2

Dmitry Kurochkin dmitry.kurochkin at gmail.com
Wed Dec 19 14:53:44 UTC 2007


2007/12/19, David Roundy <daveroundy at gmail.com>:
> On Dec 18, 2007 10:00 PM, Dmitry Kurochkin <dmitry.kurochkin at gmail.com> wrote:
> > I tried to get pipelining working with cURL, but no luck. It looks to me
> > that cURL multi API is overcomplicated and not-too-well documented...
> >
> > So I have taken a look at libwww and it works great! I created (copied
> > a sample) simple program to load given URL many times using only
> > persistent connection and using pipelining. And results are much better
> > than I expected: loading 1000 times http://nnov.ru takes 1:29,49 with persistent
> > connection and only 22,946s with pipelining!
> >
> > I will experiment with replacing current cURL implementation with libwww.
> > I am not familiar with Darcs sources, advices are welcome. After a quick
> > look at External.hs it looks to me like we can provide a function which takes
> > a list of URLs instead of one, and fetches them using pipelining. I think
> > this will require minimum changes to Darcs sources. Am I correct with this?
>
> Actually, there's already a copyRemotes function (and related
> functions) that grabs multiple files at a time from a list of URLs.
> Replacing this function with one that uses pipelining will gain us
> something when using old repositories.  The catch is that we rarely
> call this function (only for get, I believe), and never call it when
> using the new hashed format.  The trouble is that it requires that we
> know in advance which files we will need, which doesn't interact well
> with programmer-friendly lazy downloading.
I have created a Libwww.hs module and hslibwww.c. Libwww.hs provides
getUrl and getUrls functions. I have changed copyRemotesNormal to use
getUrls. And it is ready for testing. But I get compilation errors on
hslibwww.c.
It is compiled with GHC, not GCC. If I run GCC by hand it works fine. Errors
say that there are redefined symbols in libwww header files, like:

In file included from /usr/include/w3c-libwww/WWWLib.h:50,

                 from src/hslibwww.c:5:0:

/usr/include/w3c-libwww/wwwsys.h:1099:1:
     warning: "strchr" redefined

Any ideas? Why is GHC used for C sources?

>
> So a much nicer feature would be something that can work with lazy
> downloading, and somehow add URLs to the pipelined queue as we go.  I
> don't know if libwww will work with this approach, but I suspect it'd
> require the least reworking of darcs' code.
If I understand correctly the only way to implement this is a background thread.
Or some kind of event loop inside darcs...
Multithreading with FFI is a tricky point.

What is the problem with getting all filenames before starting a download?
Do not we know in advance what patches we need?

If which patches we need to download depends somehow on content of patch
we have just downloaded we can provide a callback for darcs. When
libwww completes
another transfer it calls a callback and darcs based on content of this patch
adds new downloads to event queue.
The question here is, can we call haskell functions from C? I have no
experience here...
If we can not than I guess we can create a custom event loop for
libwww. This will need
some more reading of libwww docs, but from what I have learned so far
about libwww
this is doable.
>
> But definitely rewriting copyRemotes is a good starting point.
> Ideally we wouldn't remove the libcurl code, but would enable
> configure checks to use libwww if we can, otherwise use libcurl if
> it's present, and finally fall back on wget, etc if no librarys are
> present.
Yes, this should be the first step. I hope to resolve compilation
issues soon. After
that some more changes are needed (like printing progress).
I am not familiar with configure stuff, so help is welcome here.

Regards,
  Dmitry
>
> David
>


More information about the darcs-devel mailing list