[darcs-devel] Darcs-git pulling from the Linux repo: a Linux VM question

Wed Apr 27 06:10:25 PDT 2005

Hi,

If you are one of the few initiated who can tune the Linux VM, please
skip to the end of this mail and give me some advice.  If you are one
of the even fewer initiated who understand Darcs' memory usage, read
the whole of this message and send me a patch.  Otherwise, press D.

Now that I've got a Darcs that groks Git repos, I can play with a
fairly large tree -- the Linux 2.6 one.  All the experiments described
below were done on a 1.4 MHz Pentium-M with 640 MB of memory, running
Linux 2.6.9 (Debian branded) over Reiserfs. 

All the commands that don't need to actually read the underlying blobs
are instantaneous; for example, ``darcs changes'' takes 0.4s.
Commands that require reading the blobs but allow discarding them
straight away are reasonable enough -- ``darcs changes -s'' on all but
the initial import takes a very reasonable 15s, ``darcs changes -s''
including the initial import takes 2m30s real time, (50s CPU time).

The trouble, of course, is with commands that need to read a full tree
and keep it in memory.  This is, unfortunately, the case with pull of
the initial commit, which is over 200MB in size.  Darcs behaviour when
pulling this initial commit is as follows.

As I'm currently reading the git repository eagerly, Darcs starts by
reading the whole of the initial tree into memory; this takes roughly
2 minutes of real time (at less than 10% CPU), reads 18987 Git files
(blobs and trees), of which 18512 are unique (meaning that less than
500 were read two times or more -- yes, I should be keeping track of
the blobs I've already read).  When that is done, Darcs' VMEM usage is
beneath 300MB.

At that point, Darcs stops doing I/O, and starts trying to interpret
the data.  It runs between 80% and 100% of CPU, and grows steadily
until its VMEM reaches 550MB.  At that point, the system starts
swapping very lightly (no more than 200kB/s or so), and Darcs' VMEM
usage grows up to 720MB after 5 minutes CPU, 8 minutes real time.

When Darcs has grokked the fullness of the Linux kernel, it decides to
write out a patch.  So it starts touching all of its memory while
simultaneously writing out data to a patch file at a fairly sustained
rate.  It gets pretty close to the end -- over 200MB of patch are
written --, when suddenly the system appears to freeze for a second,
then the OOM killer triggers and kills the Darcs process.

Now obviously there is a problem with Darcs -- it shouldn't be needing
720MB of virtual memory just to grok a 250 MB import --, but there's
also a problem with the VM.  A 720 MB process should be reasonable on
a machine with 640 MB, and there's no apparent reason why the kernel
couldn't go more heavily into swap.  My completely uninformed guess
would be that the heavy I/O activity generated by Darcs in the final
stage causes a shortage of some resource (probably buffers) that is
essential for the VM to perform the swapping, and that the only way
the kernel sees to get itself out of the tight spot is to invoke the
OOM killer on the process that's causing the I/O activity.

So yes, in the longer term we need to fix Darcs.  For now, does anyone
know how I can tune the Linux VM to get a 720 MB process to run
reliably in 640 MB of main memory?  Obviously, adding swap or tuning
the overcommit policy doesn't help (the issue is precisely that the VM
refuses to dig into the swap early enough).  I don't understand what
``swappinness'' is, but it doesn't appear to help.  The
``min_free_kbytes'' and ``dirty_*'' knobs look promising, but nobody
seems to know what they mean.

So what was it you said about self-tuning VM systems?

                                        Juliusz