[darcs-users] Re: Patch dependency question

David Roundy droundy at abridgegame.org
Thu Nov 4 11:44:22 UTC 2004

On Wed, Nov 03, 2004 at 02:06:15PM +0000, Catalin Marinas wrote:
> This is a hypothetical situation. Even with makefiles, adding a new
> file to be built, for example, might mean extending an existing line
> with that file name, which means modifying the line of an existing
> patch and hence creating a dependency (character-oriented patches
> would allow this).

Yeah, my thought is that people who are seriously concerned about avoiding
dependencies due to makefiles can have a policay of always putting new
files on a line by themselves, but that's about as far as they can go.

> Anyway, darcs' patch dependency tracking works much better than
> generating diff's which do not allow any commuting. Unfortunately, I
> cannot use it with the Linux kernel (that's what I mainly do) because it
> is very slow (3 hours for a big patch merging) and uses a lot of memory
> (over 600MB).

Have you tried using --enable-antimemoize? This should reduce memory usage
dramatically, and if you're swapping should also therefore reduce time.
It's not very well profiled, partly because my machine is wimpy enough that
when you add on the profiling overhead, I can only test moderately small
repositories.  Also, because I got tired of working on it, to be honest.

> I don't know if the current structure allows this (and I don't know
> Haskell either, I should learning it) but patch commuting and dependency
> tracking could probably be done by only looking at the line numbers a
> hunk modifies, without loading the hunks text into memory. The hunks text
> should be loaded from disk only when applying them or when saving the
> commuted patch. This might reduce the RAM usage.

You may benefit by turning off patch compression in your repository.  Then
the actual hunk text will be mmapped, and so should be swap-friendly.  In
fact, it's never copied after the patch itself is read in, even when the
patch is parsed, so it remains mmapped memory.  The place where the big
savings can come in is in the parsed structure that stores where each line
begins and ends.  This is dealt with by the antimemoize trick, which allows
the garbage collector to throw away the result of this parsing and then
reparse when the hunk contents are actually needed.  However, since we
don't store the number of lines in the patch, the commutation isn't as
efficient as it could be.  In particular, if we're unlucky, when running
with antimemoize we could have to parse the hunk several times during
multiple commutations, where if we stored the "old" and "new" hunk sizes we
could commute without doing this.  :(
David Roundy

More information about the darcs-users mailing list