[darcs-users] Current Pull Attempting to Overallocate
droundy at abridgegame.org
Sat Jan 31 13:37:17 UTC 2004
On Thu, Jan 29, 2004 at 05:08:36PM +0200, Aggelos Economopoulos wrote:
> On Thu, 29 Jan 2004 07:54:38 -0500
> David Roundy <droundy at jdj5.mit.edu> wrote:
> > Well, the ghc allocation involves a contiguous heap, and when it runs
> > out of space it increases the ghc heap size. So I don't want to get
> > my memory stuck inside the ghc heap, or the ghc heap will get bigger
> > than needed,
> OK, I took a peek inside MBlock.c and I think I see what you mean. ghc
> allocates memory in 1MB blocks that must be 1MB aligned (this limitation
> only seems to exist because of the way HEAP_ALLOCED() and
> MARK_HEAP_ALLOCED() are implemented, correct me if I'm wrong. Also, does
> anyone know why it doesn't use a bitmap so it could do allocations in
> 128K blocks?) so when you do a minimal allocation (4K on x86), ghc can't
> use the (1M - 4K) that's left. If this is the case, I don't think the
> ghc allocation space gets much bigger than needed and of course darcs
> can still make use of the remaining space. Or were you refering to
> something entirely different?
I don't know. I've not looked at the ghc code (and don't have a copy of it
handy to look at). My knowledge of what it does is all empirical (not the
best way to obtain knowledge, I'll admit).
> > which can have very unpleasant results, since it doesn't trigger a gc
> > (normally) until it fills up its heap,
> Where can I find this logic in the ghc source
I don't know (see above). My knowledge of ghc and its GC is based
primarily on watching how darcs behaves. Haskell is certainly constantly
allocating memory, and has to run GCs regularly to get rid of that memory.
The advantage of having a large heap is that you can run GCs more seldom,
which makes for faster code. My impression is that the RTS decides when a
GC is needed based on what fraction of the heap is filled, the result being
that whatever the size of the heap is, that much memory gets used, and the
heap certainly doesn't ever shrink (I've asked about that).
> > so if the heap is bigger than your physical mem, you've got very
> > serious swapping problems that don't go away until darcs exits,
> > regardless of your memory usage.
> I don't see how this would lead to very serious swapping problems
> regardless of your memory usage. If you've mapped 1G (and touched) but
> currently only use 100M, the operating system can swap out some of your
> not-so-recently-referenced pages without slowing you (or the system)
> down much.
The problem is that ghc always fills up the heap (or at least seems to do
so), so you can't just swap out and leave swapped out the 1G heap, since
ghc will fill it up and the GC it.
> Now, I you want to actually use more memory than the system has you
> have *real* problems and no amount of malloc() tricks can get you out
> of them (unless of course you're in pathological cases: too much
> internal fragmentation, OS mistakenly swaps out your pages etc)
> > At least, this is my understanding of how it works. And the ugly
> > tricks *do* give massive performance benefits when running darcs check
> > on a large repository (i.e. one that is big enough that the peak memory
> > usage involves swapping).
> Again, if you need more memory than what's available, how can
> allocating your memory at a different address help you?
It helps you when you don't need more memory than what's available, but
your peak usage (which is only used for a short time) requires swapping.
In my example case (which is a darcs check), the memory usage goes to over
400M (on my 384M machine) at the beginning, due to a large patch that has
to be read and parsed. After that, with the nice trick, memory usage drops
to 260M and everything is beautiful (i.e. no swapping). Without the nasty
trick, memory usage stays at 400M and CPU usage never goes above 10% due to
swapping until the program exits a few weeks later (although I've never let
it run to completion when swapping).
More information about the darcs-users