[darcs-users] Re: [BK] upgrade will be needed
jch at pps.jussieu.fr
Wed Mar 2 15:06:01 UTC 2005
> > It's reasonable if the garbage collector hasn't run recently. Darcs
> > only unmaps files on GC, and when a file gets modified, darcs creates
> > a new inode and then (when it wants to read the modified version)
> > mmaps the file again.
That's very interesting. Ralph, thanks for the analysis.
> Without wishing to offend anyone, but coming from a background of being
> hired to examine performance issues, I find the lack of control over
> resources for the programmer, coupled with my lack of understanding
> about what RTS will do and when, frustrating.
I can sympathise with that feeling.
Try to think about it as follows. (David, ideas for fixing the prob-
lem in Darcs at the end).
In languages with explicit allocation (Pascal, C, etc.), all resources
are allocated and freed manually. So unless the memory allocation is
buggy, the memory usage is reasonable.
In garbage-collected languages (Lisp, ML, Java, Haskell etc.), there
are two sorts of resources: garbage-collected memory and other
When you allocate old-fashioned memory (the kind that's your private
business) you go through the GC. A well-designed GC will consider the
recent history of the program's behaviour, and typically make better
decisions about when to free data than the programmer will do by hand.
The trouble is with resources that the GC is oblivious to -- file
descriptors, mmapped memory, and, in some implementations, malloced
memory. Such resources are allocated outside the GCd area, and risk
causing a memory leak.
The basic way of dealing with such resources is to allocate a small
GCd data structure -- called a handle --, and to put a pointer to the
alien resource in the handle. Examples of handles in Darcs are file
handles -- analogous to C's FILE pointers --, and the ones that
concerns us, FastPackedStrings, that contain pointers to malloced or
When you've got GCd handles that point to alien data, you've got a
problem: if the GC decides to discard the handle, you've got an alien
memory leak. There are two and a half ways of dealing with that problem.
One is to put the burder on the programmer: the programmer needs to
explicitly ``close'' or ``destroy'' a handle before it becomes
eligible for garbage collection. After closing, the handle structure
still exists, but it is unusable (typically, it contains a null
pointer or the fake file descriptor ``-1'').
The second one is to kindly ask the garbage collector to destroy the
alien resource before it discards it. This is done by associating
with the handle a ``finaliser'', a function that does The Right Thing.
As the GC does not necessarily destroy garbage in a timely manner,
this might cause temporary memory leaks, which is what we're seeing.
The third solution is to use both methods: provide a manual ``destroy''
operation, and *also* use finalisers in case the programmer forgets to
close the handle. This is the solution used for file handles.
Unless I'm mistaken, Darcs' FastPackedStrings use solution 2: they
rely on the garbage collector to unmap the mapped data and provide no
manual ``destroy'' interface. If we add a destroyPackedString
function while keeping the finalisers, we could then incrementally add
calls to destroyPackedString where there are found to be safe.
Okay, so how do we implement destroyPackedString? FastPackedStrings
are declared as follows:
data PackedString = PS !(ForeignPtr Word8) !Int !Int
A PackedString is of the form (PS fp s l), where fp is a pointer to
alien memory (associated with a finaliser), s the offset to the start
of the string, and l the length of the string. The trouble with that
is that there is no way to find out whether the ForeignPointer was
malloced or mmapped -- only the finaliser knows that. And as far as I
know, there's no way to force the finaliser to run.
Anyone familiar with the internals of GHC's ForeignPtrs?
More information about the darcs-users