[darcs-users] darcs push hangs
Ben Franksen
ben.franksen at online.de
Wed Feb 4 16:43:31 UTC 2015
Ben Franksen wrote:
> Manoj Gudi wrote:
>> I am trying to commit a new log file (size 20Mb), `darcs record` works
>> fine, however `darcs push` hangs after asking for credentials..
>>
>> Any ideas?
>
> You hit a long-standing weak spot of Darcs: transmitting large files can
> take a long time.
tl;dr What we need to do in Darcs is to compress the bundle before sending
it over the line. That's what git seems to do and it cuts down quite a bit
on the transfer time.
Here is how I arrived at the conclusion.
Adding a few debug messages, I quickly found out that the culprit is the way
we transfer the bundle. This is done with a pipe, the remote process (ssh
user at host:path apply) reads the data from stdin. You can try to emulate this
method with the following one-liner:
ben at sarun[1]: /tmp/large > time cat american-english-times21 | (ssh
franksen at tiber.acc.bessy.de /tmp/large/readit /tmp/large/american-english-
times21)
cat american-english-times21 0,00s user 0,05s system 0% cpu 2:25,97 total
( ssh franksen at tiber.acc.bessy.de /tmp/large/readit ; ) 1,06s user 0,09s
system 0% cpu 2:42,86 total
Here, "american-english-times21" is a file that is about 20MB large, and the
script "readit" on the remote side is just
franksen at tiber: /tmp/large > cat readit
cat - > $1
Ok, so the raw transfer costs about 2:40 minutes. Darcs does a bit more so a
constant overhead factor of 2..3 is not surprising.
Note that even rsync over ssh isn't much faster, at least when the file does
not yet exist on the remote side:
ben at sarun[1]: /tmp/large > time rsync american-english-times21
franksen at tiber.acc.bessy.de:/tmp/large/american-english-times21
rsync american-english-times21 1,24s user 0,14s system 0% cpu 2:43,35
total
On the other hand, git seems to be specially optimized for this purpose:
ben at sarun[1]: /tmp/large > time git push --all
franksen at tiber.acc.bessy.de:/tmp/large
[...]
git push --all franksen at tiber.acc.bessy.de:/tmp/large 1,96s user 0,04s
system 4% cpu 46,982 total
(That was after I studied the failure message and subsequently configured
the remote git repo accordingly.)
That made me think "well, how do they do that?" and the next thought was
"compression?", so I tried it:
ben at sarun[1]: /tmp/large > time cat american-english-times21 | gzip - | (ssh
franksen at tiber.acc.bessy.de /tmp/large/readit /tmp/large/american-english-
times21)
cat american-english-times21 0,00s user 0,04s system 0% cpu 27,593 total
gzip - 1,52s user 0,02s system 5% cpu 27,598 total
( ssh franksen at tiber.acc.bessy.de /tmp/large/readit ; ) 0,28s user 0,03s
system 0% cpu 45,558 total
Bingo! That's almost exactly the time 'git push' needs.
It looks as if we already have the infrastructure for this (zip/unzip) in
place, so the next thing I will try is to patch Darcs to use compression and
see where that gets us.
Cheers
Ben
--
"Make it so they have to reboot after every typo." -- Scott Adams
More information about the darcs-users
mailing list