[darcs-users] bug reports for darcs diff

David Roundy droundy at abridgegame.org
Fri Sep 26 15:16:25 UTC 2003


On Fri, Sep 26, 2003 at 05:52:23PM +0300, Aggelos Economopoulos wrote:
> > I'm still not sure it gives any noticable improvement.  It probably
> > would be most helpful when reading in the patch files because they can
> > be pretty large and get held in memory for quite a while, but because
> > they are large I support compressing them, and haven't implemented code
> > to check if they are compressed and mmap them if they haven't been
> > compressed.
> 
> Well, there are other advantages too. In modern operating systems,
> mapping a file does not start i/o until you actually take the page fault
> (i.e. until you actually try to read the data) and even then you only get
> the pages you access (but I think darcs needs to access all the file data
> anyway, so no big gain there). If you were mapping many files and _then_
> trying to access the pages (that is, if the pattern were 'map A, B, C,
> ...; access A, B, C, ...'  instead of 'map A; access A; map B; access B;
> ...;') you could probably save more than a few disk seeks by calling
> madvise(MADV_WILLNEED) which will start i/o asynchronously (linux does
> this, I'm not sure about the bsds), thus giving the kernel a chance to
> batch the page transfers. Have you tried to measure darcs performance
> (e.g. on a 'get') with and without mmap? Obviously, that's the only way
> to see if it matters performance-wise (and I should definately append
> this to my TODO file).

I've measured a few operations with and without mmap, and seen no
difference.  I doubt there will be much of a difference unless the files
are very large.

> Another advantage of using mmap is that you're being friendly to the
> operating system when there is memory pressure in the system. If you used
> read(2), every page would have to be swapped out (should the OS decide to
> evict one of your pages). However, clean memory maped pages (pages that
> have not been written to) can just be thrown away. Dirty pages can be
> written back to the filesystem instead of the swap space. So, even if
> mmap usage doesn't noticably improve performance, I'd still prefer it to
> read() calls.

This is actually the main reason I'm interested in mmap in the first place,
and there does seem to be some improvement in the memory behavior of my
test case.  Alas, I've got a memory leak that I haven't been able to track
down that makes measuring things a bit tough on my large test repository.
So the first thing to track down is that memory leak.  Unfortunately, ghc
segfaults when I try to profile memory usage.  The ghc bug is fixed in cvs,
but I haven't gotten around to compiling ghc from cvs.
-- 
David Roundy
http://civet.berkeley.edu/droundy/




More information about the darcs-users mailing list