[darcs-users] Re: Patch dependency question

Catalin Marinas catalin.marinas at arm.com
Thu Nov 4 14:40:06 UTC 2004


David Roundy <droundy at abridgegame.org> writes:

> Yeah, my thought is that people who are seriously concerned about
> avoiding dependencies due to makefiles can have a policay of always
> putting new files on a line by themselves, but that's about as far
> as they can go.

But adding some lines at the end (or beginning) of an existing patch
might also mean extending it and thus creating a dependency which
darcs does not consider. The user can probably manually set this
dependency if s/he really wants.

> Have you tried using --enable-antimemoize? This should reduce memory
> usage dramatically, and if you're swapping should also therefore
> reduce time.  It's not very well profiled, partly because my machine
> is wimpy enough that when you add on the profiling overhead, I can
> only test moderately small repositories.

I tried with --enable-antimemoize but it still uses a lot of memory
(maybe a half of the previous usage) and was slow.

> Also, because I got tired of working on it, to be honest.

That's sad since darcs looks like on of the most advanced SCM tools.

> You may benefit by turning off patch compression in your repository.
> Then the actual hunk text will be mmapped, and so should be
> swap-friendly.

This was the first thing I tried.

> In fact, it's never copied after the patch itself is read in, even
> when the patch is parsed, so it remains mmapped memory.

If all the patch file is read in (i.e. accessing at least one location
in each 4KB block), it still uses the equivalent size of RAM which the
system might want to free if it is low on memory.

Mmapping doesn't seem to bring many advantages on Linux (when the
whole mmapped file is accessed) since its virtual memory system tends
to send removed pages to swap when it is low on memory rather than
simply removing them and re-reading them from the mmapped file when
needed (assuming a read-only opening). Only when it is also low on
swap memory, it begins removing the mmapped pages. It might sound
inefficient but benchmarks showed that it is better because there is
an in-RAM buffer before the swap space where pages reside until being
written to disk.

> The place where the big savings can come in is in the parsed
> structure that stores where each line begins and ends.  This is
> dealt with by the antimemoize trick, which allows the garbage
> collector to throw away the result of this parsing and then reparse
> when the hunk contents are actually needed.

Does darcs need to parse all the patches for dependency checking? It
would be useful to keep separate lists of patches touching individual
files and avoid looking through the whole repository (which is quite
big for trees larger than 300MB).

> However, since we don't store the number of lines in the patch, the
> commutation isn't as efficient as it could be.  In particular, if
> we're unlucky, when running with antimemoize we could have to parse
> the hunk several times during multiple commutations, where if we
> stored the "old" and "new" hunk sizes we could commute without doing
> this.  :(

Ah, OK, it's clearer now, darcs needs to read the whole patch to find
out the number of lines (and Linux has huge patches - 18MB diff
between 2.6.8 and 2.6.9).

Do you have any plans to improve this in the future (after 1/2.0.0) or
you'd like not to modify the current repository structure?

Thanks for this interesting discussion,

Catalin





More information about the darcs-users mailing list