[darcs-users] darcs push hangs

Ben Franksen ben.franksen at online.de
Wed Feb 4 16:43:31 UTC 2015


Ben Franksen wrote:
> Manoj Gudi wrote:
>> I am trying to commit a new log file (size 20Mb), `darcs record` works
>> fine, however `darcs push` hangs after asking for credentials..
>> 
>> Any ideas?
> 
> You hit a long-standing weak spot of Darcs: transmitting large files can
> take a long time.

tl;dr What we need to do in Darcs is to compress the bundle before sending 
it over the line. That's what git seems to do and it cuts down quite a bit 
on the transfer time.

Here is how I arrived at the conclusion.

Adding a few debug messages, I quickly found out that the culprit is the way 
we transfer the bundle. This is done with a pipe, the remote process (ssh 
user at host:path apply) reads the data from stdin. You can try to emulate this 
method with the following one-liner:

ben at sarun[1]: /tmp/large > time cat american-english-times21 | (ssh 
franksen at tiber.acc.bessy.de /tmp/large/readit /tmp/large/american-english-
times21)
cat american-english-times21  0,00s user 0,05s system 0% cpu 2:25,97 total
( ssh franksen at tiber.acc.bessy.de /tmp/large/readit ; )  1,06s user 0,09s 
system 0% cpu 2:42,86 total

Here, "american-english-times21" is a file that is about 20MB large, and the 
script "readit" on the remote side is just

franksen at tiber: /tmp/large > cat readit 
cat - > $1

Ok, so the raw transfer costs about 2:40 minutes. Darcs does a bit more so a 
constant overhead factor of 2..3 is not surprising.

Note that even rsync over ssh isn't much faster, at least when the file does 
not yet exist on the remote side:

ben at sarun[1]: /tmp/large > time rsync american-english-times21 
franksen at tiber.acc.bessy.de:/tmp/large/american-english-times21
rsync american-english-times21   1,24s user 0,14s system 0% cpu 2:43,35 
total

On the other hand, git seems to be specially optimized for this purpose:

ben at sarun[1]: /tmp/large > time git push --all 
franksen at tiber.acc.bessy.de:/tmp/large
[...]
git push --all franksen at tiber.acc.bessy.de:/tmp/large  1,96s user 0,04s 
system 4% cpu 46,982 total

(That was after I studied the failure message and subsequently configured 
the remote git repo accordingly.)

That made me think "well, how do they do that?" and the next thought was 
"compression?", so I tried it:

ben at sarun[1]: /tmp/large > time cat american-english-times21 | gzip - | (ssh 
franksen at tiber.acc.bessy.de /tmp/large/readit /tmp/large/american-english-
times21)
cat american-english-times21  0,00s user 0,04s system 0% cpu 27,593 total
gzip -  1,52s user 0,02s system 5% cpu 27,598 total
( ssh franksen at tiber.acc.bessy.de /tmp/large/readit ; )  0,28s user 0,03s 
system 0% cpu 45,558 total

Bingo! That's almost exactly the time 'git push' needs.

It looks as if we already have the infrastructure for this (zip/unzip) in 
place, so the next thing I will try is to patch Darcs to use compression and 
see where that gets us.

Cheers
Ben
-- 
"Make it so they have to reboot after every typo." -- Scott Adams




More information about the darcs-users mailing list