[darcs-devel] announcing darcs 2.0.0pre1, the first prerelease for darcs 2

Fri Jan 4 23:58:35 UTC 2008

2008/1/4, David Roundy <droundy at darcs.net>:
> On Fri, Dec 21, 2007 at 04:12:49AM +0300, Dmitry Kurochkin wrote:
> > I have completed initial work on libwww pipelining. Output of darcs whatsnew
> > is attached (sorry for that, I will try to make a proper patch tomorrow).
> > What is done:
> > - libcurl functionality is implemented using libwww. Now pipelining works.
> > - New Libcurl module provides 3 functions:
> >   * copyUrl - same as copyUrl from Curl.hs. It uses copyUrls and waitNextUrl.
> >   * copyUrls - takes (filename, url) list, creates requests and adds
> > them to libwww. Does not load anything.
> >   * waitNextUrl - starts libwww event loop and blocks until first url
> > loads (or error happens). After it returns it should be possible to
> > add more urls to queue using copyUrls again. waitNextUrl should be
> > called as many times as urls are in the queue.
>
> Thanks for this contribution! I've finally gotten around to writing the
> promised configure support for this, and it looks pretty nice, particularly
> as a starting point for an internal API that we can use (and hopefully
> which can also be supported through the curl multi API).
>
> I've got a couple of suggestions/questions, now that I've had time to look
> at the actual code.
>
> How hard would it be to make a function
>
> waitForURL :: String -> IO ()
>
> which ensures that we've already got the given URL.  This would allow us to
> speculatively call copyURLs to grab stuff we expect to use later, without
> keeping track of the order in which they were queued (so as to call
> waitNextUrl the proper number of times).  I think this would be a real
> improvement.
>
> Related to this would be a feature to ignore duplicate calls to copyURLs.
> This may not be supported by libwww itself, but it'd be really handy, again
> for speculative triggering of downloads.

I think we can keep track of started downloads in a list. Then we can
check for duplicate URLs. Implementing waitForURL should not be hard.
We can put completed URLs into a list. When waitForURL is called, we
search the list for given URL. If found we remove it from the list and
return. Otherwise we wait until download is complete... So waitForURL
can be called for URLs in any order, but if it is not called for some
URL that URL will stay in the list...
>
> Also related to this idea:  can we adjust the order of downloads in the
> queue? e.g. maybe I'd like to add a file towards the front of the queue
> because I need it right now.  This might be doable if waitForURL could bump
> up the priority of that URL, in case it hasn't yet been requested from the
> server.
Implementing priority is more difficult. There is no priority support
in curl AFAIK. Maybe there is priority in libwww but I do not think
this is a good solution in long term. I am working on better
pipelining in curl. And when it is ready I think we should remove
libwww support from darcs and implement pipelining for new versions of
libcurl.

Regarding the priority. Maybe we can get what something like this with
a second connection to the server? One connection for high priority
URLs and another for all other URLs. For darcs interface will remain
the same with addition of fuction to add high priority URLs.

I think I will look at implementing waitForURL first. But we are still
celebrating New Year in Russia, so not sure when I will get to this :)

Regards,
  Dmitry

>
> I'm thinking of situations like this:
>
> We're doing a darcs get.  This involves grabbing all the inventory files
> and all the patch files from the server.  Each inventory file has pointers
> to many patch files and the next inventory file.  We don't know how many
> patches there are in the repository until we've downloaded (and read) all
> the inventory files.
>
> We could get the inventory files sequentially with no pipelining, count the
> patch files, and then grab the patch files with pipelining and providing
> nice feedback.  But this is a bit ugly:  we waste all that time while
> grabbing inventory files and waiting for the entire latency, when we
> already know where a whole bunch of patch files are that we could be
> grabbing.
>
> So a faster alternative would be once we have the first inventory file to
> queue up the second inventory file and also all the patch files listed in
> that inventory.  Then when we get the second inventory, we queue up the
> third inventory and all the patch files in the third inventory, etc.  This
> is ugly (with the current API) because we won't get the last inventory
> until we've already downloaded almost all the patch files.  It's very fast
> (everything is pipelined), but because we've got a FIFO queue, the third
> inventory can't be grabbed until we've already gotten all the patch files
> from the first inventory, so we can't give nice feedback counting the
> number of patch files we've got versus the total number.
>
> Which is why it'd be nice to be able to prioritize the inventory files that
> we're waiting on, so that we queue up the second inventory followed by all
> the patches listed in the first inventory, but then when we get the second
> inventory, we slip the third inventory in at the head of the queue.  So we
> get all the inventories pretty quickly (although probably not as quickly as
> if we took the first approach) and we're also interleaving the downloading
> of patch files, keeping the pipe full (in theory, anyhow).
>
> > At the moment the only place where copyUrls is used is get command.
> > But I hope this interface
> > is enough for Darcs. If not - we need to think of smth more complex.
> > Waiting for comments here.
>
> Hmmm.  I think comments are above.  It's actually not a bad interface as
> is, but seems waitNextUrl seems a bit awkward to use.  Actually, it has now
> occurred to me that we could implement waitForURL as a wrapper around
> waitNextURL, if we kept tabs on what had been shoved in the queue.  It
> seems a bit ugly, but we could live with that sort of solution, if libwww
> doesn't have this functionality.
>
> > What is missing:
> > - DARCS_PROXYUSERPWD is not used (but http_proxy works).
> > - Proper error handling.
> > - Not tested.
> > - ???
>
> These issues are somewhat less critical now that this can coexist with the
> libcurl code.  Only interested users are likely to use the new code, so
> it'll have a bit of time to mature.
>
> I haven't yet done any performance testing myself.  That comes next (and
> requires using my laptop, since the network of my work computer is too fast
> for this to have a noticeable effect, as far as I can tell.
>
> I expect I'll be applying this soon to the unstable repository.
>
> David
> _______________________________________________
> darcs-devel mailing list
> darcs-devel at darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-devel
>