[darcs-devel] darcs patch: add test that triggers "too many open fi... (and 2 more)

David Roundy droundy at darcs.net
Sat Aug 27 12:37:48 PDT 2005


Hi,

I've come across a bug (which I didn't report, since I fixed it
instead...), which shows up when I grab over a thousand patches with a
single pull.  This cases an error about "too many open files".

The problem is in save_patches in Pull, which calls writePatch once per
patch.  writePatch itself reads the patch file it writes with the
(relatively) new gzReadPatchFileLazily.  Unfortuntely, the memory-efficient
gzReadPatchFileLazily code turns not to be file-descriptor-efficient, as it
opens and leaves open a Handle for each patch written.  :(

I've implemented two fixes for this problem.  I suspect that either alone
would "fix" the problem... I know that the unsafeInterleaveIO fix by itself
is sufficient.  And the other fix (which is helpful even with the
unsafeInterleaveIO fix) may not cover other similar bugs that may exist in
other commands, so I'd prefer to use both fixes, always provided noone can
see a problem with either fix.

The first patch avoids the file-descriptor leak by using an
unsafeInterleaveIO call to avoid opening the file until we actually want to
read from it.  This is potentially unsafe if the file is modified before
that time, but it isn't *very* unsafe.  This also has the benefit that the
file doesn't need to be opened at all if we don't need the patch later.

I've also implemented a separate patch, which *also* should solve the
problem in a somewhat safer, but also less comprehensive manner.  The
second patch makes save_patches in Pull grab the patch ID from the patches
before writing them, rather than reading the patch file afterwards to find
the ID (which was silly).  This means we never use the patch files we read,
so as long as the GC runs fast enough, this patch without the former should
be sufficient to avoid running out of file descriptors when pulling many
patches.


Alternative solutions involve *not* reading the files lazily.  We could
also alleviate the problem by increasing the block size when lazily reading
files.  It looks like we never realloc our buffer, so increasing the block
size perhaps should be combined with a change to reduce the size of the
string to what is actually needed.

Another improvement that would go a long way towards reducing
file-descriptor leakage would be in gzReadFileLazily to check if the amount
read was less than a block size, in which case we know that the file is
finished and can close it immediately rather than closing it only when the
user asks for the second PackedString.  Actually, now that I look at the
code, this would be a *very* good thing to do.  It would mean that we'd
only leave files open if they are bigger than a blocksize, which would be
nice, especially if combined with increasing the block size to 4k or 8k.

Sat Aug 27 15:22:15 EDT 2005  David Roundy <droundy at darcs.net>
  * add test that triggers "too many open files" bug.
  We just need to pull over 1024 patches at once to trigger this bug on my
  linux system.

Sat Aug 27 15:23:49 EDT 2005  David Roundy <droundy at darcs.net>
  * avoid reading patches we just wrote when pulling.

Sat Aug 27 15:24:15 EDT 2005  David Roundy <droundy at darcs.net>
  * use unsafeInterleaveIO in Repository.writePatch to delay opening files.
  This patch solves the "too many open files" bug when pulling many patches.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/x-darcs-patch
Size: 8786 bytes
Desc: A darcs patch for your repository!
Url : http://lists.osuosl.org/pipermail/darcs-devel/attachments/20050827/51c45fc9/attachment.bin


More information about the darcs-devel mailing list