[darcs-devel] pipelining downloads

David Roundy droundy at darcs.net
Sat Jan 5 16:19:17 UTC 2008


On Sat, Jan 05, 2008 at 02:58:35AM +0300, Dmitry Kurochkin wrote:
> I think we can keep track of started downloads in a list. Then we can
> check for duplicate URLs. Implementing waitForURL should not be hard.
> We can put completed URLs into a list. When waitForURL is called, we
> search the list for given URL. If found we remove it from the list and
> return. Otherwise we wait until download is complete... So waitForURL
> can be called for URLs in any order, but if it is not called for some
> URL that URL will stay in the list...

Yes, this sounds good.  Perhaps we might prefer to use a Data.Set.Set to
get better scaling.  I normally prefer to just use lists (because the code
is simpler and we don't have to worry about weird laziness issues we don't
understand, as can happen so easily with Data.Map).  But we might be asking
for large numbers of files.

Hmmmm.  Actually, maybe we don't need to keep track of files downloaded
except to remove them from the non-yet-completed list, since the file ought
to already be present in its destination? We have two sets of files:
those already requsted but not completely downloaded and those that have
been completely downloaded.  The latter set can be identified because the
file is present and the URL is not in the to-be-downloaded list of URLs.

But this would mean that when we're asked to download a file, we'd have to
check both whether it's already on the list-to-get, and also whether the
file already exists.

> > Also related to this idea:  can we adjust the order of downloads in the
> > queue? e.g. maybe I'd like to add a file towards the front of the queue
> > because I need it right now.  This might be doable if waitForURL could bump
> > up the priority of that URL, in case it hasn't yet been requested from the
> > server.
>
> Implementing priority is more difficult. There is no priority support
> in curl AFAIK. Maybe there is priority in libwww but I do not think
> this is a good solution in long term. I am working on better
> pipelining in curl. And when it is ready I think we should remove
> libwww support from darcs and implement pipelining for new versions of
> libcurl.

I'd rather keep libwww support in darcs (unless you don't want to implement
waitForURL for it), at least for a little while, since the version of
libcurl with pipelining support is so new.

> I think I will look at implementing waitForURL first. But we are still
> celebrating New Year in Russia, so not sure when I will get to this :)

Yes, waitForURL is definitely the more important feature by far.  And have
a happy new year!  :)
-- 
David Roundy
Department of Physics
Oregon State University


More information about the darcs-devel mailing list