[darcs-devel] Re: Darcs-git pulling from the Linux repo: a Linux VM question

Linus Torvalds torvalds at osdl.org
Wed Apr 27 08:31:37 PDT 2005



On Wed, 27 Apr 2005, Juliusz Chroboczek wrote:
> 
> So yes, in the longer term we need to fix Darcs.  For now, does anyone
> know how I can tune the Linux VM to get a 720 MB process to run
> reliably in 640 MB of main memory?

I really think you're screwed. The only way you have even a _chance_ of
getting it to work well is that if you have very nice access patterns to
that 720MB, but my guess is that that simply isn't the case. You probably
read most of it in once (and write out changes once, but I hope you at
least notice the case of "nothing changed" so that probably is the smaller
of your problems), and the fact is, you're going to have absolutely
_horrible_ access patterns, since you'll end up not just with a 720MB
process that doesn't have much locality, you'll end up with another 720MB
that you needed to have in the page cache for the IO.

The only way I can see to fix it short-term is to try to use "mmap()"  
instead of "read()" to read the file data, and then try to avoid touching
the mapping unless you _have_ to. In other words: if you actually need to
_compare_ the data (which obviously reads from the mapping), you're
screwed.

Using mmap() will at least mean that the system can re-use the page cache 
pages, though, so it should improve memory pressure a bit.

> So what was it you said about self-tuning VM systems?

The kernel tries to tune itself in the sense that it automatically 
allocates the memory to user processes vs caching (page cache, directory 
caching etc) and tunes itself quite well that way.

But there's no way to tune for crappy access patterns and working sets
bigger than the amount of RAM. Sorry. You really need to fix darcs.

You _really_ shouldn't read in files that you don't absolutely need.  
That's really the biggest point of git: using the sha1 for naming the
objects is really all about "descrive the contents using 20 bytes instead
of by reading the contents". Because reading the content _will_ be
expensive. Even if you have 2GB of memory and you can keep it all cached,
it will be horribly expensive.

		Linus




More information about the darcs-devel mailing list