[darcs-users] Memory usage of Record

Reinier Lamers tux_rocker at reinier.de
Mon Aug 31 16:23:44 UTC 2009


Hi all,

On Sunday 30 August 2009 05:14:26 Jason Dagit wrote:
> Hello,
>
> Recently I noticed that if you take a large code base and try to
> import it into darcs that the performance is pretty dismal.  I was
> thinking this was once optimized and perhaps it required a special
> case like --all or something.

That optimization was removed because with it you can record patches that are 
too big for the rest of the darcs machinery.

> My setup worked like this:
> 0) build darcs for profiling as darcs-prof
> 1) Get a copy of the linux source tree (takes up 382MB of disk space)
> 2) darcs init; darcs add --recursive *
> 3) echo q | darcs-prof +RTS -p -hc -sstderr -RTS record -m "import" 2>
> darcs-prof.summary
> 4) hp2ps -c darcs-prof.hp
> 5) examine the various artifacts created by the profiler
>
> The main discovery I made today was that we're sucking the whole linux
> tree into memory and holding it there.  This creates so much pressure
> on the garbage collector that darcs spends only 30% of the time doing
> real work.  In a way, this is a strictness issue so I played with the
> strictness annotations used in the code.  In particular, the Prim data
> type is defined like this:
> data Prim C(x y) where
>     Move :: !FileName -> !FileName -> Prim C(x y)
>     DP :: !FileName -> !(DirPatchType C(x y)) -> Prim C(x y)
>     FP :: !FileName -> !(FilePatchType C(x y)) -> Prim C(x y)
>     Split :: FL Prim C(x y) -> Prim C(x y)
>     Identity :: Prim C(x x)
>     ChangePref :: !String -> !String -> !String -> Prim C(x y)
>
> data FilePatchType C(x y) = RmFile | AddFile
>
>                           | Hunk !Int [B.ByteString] [B.ByteString]
>                           | TokReplace !String !String !String
>                           | Binary B.ByteString B.ByteString
>
>                             deriving (Eq,Ord)
>
> By removing just the bangs, the profiler says that darcs is now doing
> work 40% of the time.  That's a 10% increase.

Did the maximum memory use drop or rise because of this modification?

> Next I started poking around in pending.  I noticed that in the
> pending we don't store diffs, just the list of directories and files
> to add.  So then I disabled the construction of diffs during record
> that happens when an addfile is in pending.  This resulted in a darcs
> needing less than 50 megs of ram (it needed about 500 megs to hold the
> linux source in memory).  Of course that would be a buggy version of
> darcs, but the test shows that the real space problem comes from
> eagerly loading all the hunk patches into memory.

So this was a version of darcs that would record addfile patches without the 
corresponding hunk? 

> I'm not sure where to go next with this information.  One idea I had,
> was to not calculate/create hunk patches until they are absolutely
> needed.  For example, I don't think the hunk patches are needed until
> we display them on the screen or write them to files.  Although, this
> doesn't fix the case of what to do with the data once the user has
> viewed it but before they have agreed to all the patches in the list.

Would it be possible to incrementally write the pending patch, writing hunks 
that the user has agreed to to disk immediately? Or would that require a major 
overhaul of the whole path selection mechanism?

> Does anyone else have suggestions?  Does anyone know if Petr's work
> will make a difference here?  I heard rumors of a darcs-hs that uses
> hashed-storage more aggressively.  But, I doubt it impacts the reading
> of pending?

AFAIK Petr has a zoo of repositories that perform poorly. They may be useful 
for benchmarking.

Regards,
Reinier

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20090831/846f620e/attachment.pgp>


More information about the darcs-users mailing list