[darcs-users] Re: Developer machine spec for Linux kernel w/ darcs

David Roundy droundy at abridgegame.org
Thu Dec 2 12:15:16 UTC 2004


On Wed, Dec 01, 2004 at 02:25:11PM +0000, Catalin Marinas wrote:
> I'm not familiar with Haskell or darcs internals. I just cannot find
> an explanation for the huge amount of memory used - for a 300MB source
> tree, the memory usage is more than 600MB. I don't know how much of
> this amount is uncollected garbage. It would be normal for something
> like 450MB due to the linked lists etc. but it is actually more than
> double.

It depends on the average length of lines.  For each line we store a
pointer to the original string and a beginning and ending index (12 bytes
so far).  The pointer is actually a "ForeignPtr", which has (I believe) an
additional layer of indirection to allow us to "finalize" it (unmapping the
file, or whatever), which probably costs us at least a couple of extra
pointers (one for the pointer itself and one for the finalizer).  We also
need a pointer to the next element of the list.  So we're up to 24 bytes.
Thanks to the laziness of haskell (which we *do* take advantage of, by the
way), I think we may need another couple of pointers to take into account
the possibility that this line may not have been computed, and then next
element of the list may not have been computed.  So we're up to 32 bytes.
Per line.  And that's assuming there's no overhead in the haskell RTS
memory allocation... I imagine they might store something like the size of
chunks of memory, or GC flags along with the rest.

So if an average line is around 40 characters, we pretty much expect to
have the line breaks take as much memory as the contents of the lines.  Of
course, it's quite likely that I've miscounted the memory usage somewhere,
but I think it's about as likely that I undercounted as it is that I
overcounted.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-users mailing list