[darcs-devel] Why is darcs get so slow?

David Roundy droundy at abridgegame.org
Fri Dec 31 07:47:51 PST 2004


Hi aj,

Your timings are interesting, and suggest that we could definitely get a
significant improvement in our performance.

Can you try running your get --partial timings with the following change?
This'll make darcs exit after parsing and then writing the checkpoing patch
to file.  That eliminates the writing of the actual files (and of course
also makes the get not work).

$ darcs whatsnew -u
{
hunk ./Get.lhs 150
                    needed_patches = dropWhile ((/= pi_ch).fst) $
                                     reverse $ concat local_patches
                    in do write_checkpoint_patch p_ch
+                         fail "debugging"
                          case apply_to_slurpy p_ch empty_slurpy of
                              Just s -> slurp_write_dirty opts s
                              Nothing -> fail "Bad checkpoint!"
}

If this is still inordinately slow, it means the problem is either in the
patch parsing or the patch writing code.  If necesary, we can eliminate the
patch writing--since we just downloaded it, we could just copy the patch.
On the other hand, we do need to be able to write patches pretty often, so
we could view this as a nice opportunity to optimize the patch writing
code (if the writing is the problem after all).

If this is still slow, you could (I think) put the fail before the
write_checkpoint_patch, and you'd be timing just the parsing.  This
wouldn't be true if the parsing is too lazy, but I think it's strict enough
that this timing will work.

On Fri, Dec 31, 2004 at 02:28:30PM +1000, Anthony Towns wrote:
> David Roundy wrote:
> >The advantage (which isn't trivial) of parsing the patch strictly
> >(i.e. before using it) is that we know immediately whether a patch is
> >malformed, so we can create nice error messages, and won't end up partially
> >applying a patch before finding out that it's been messed up somehow in
> >transit.  We could create a "lazy" patch parser, which would allow us to
> >interleave the patch reading with the use of the patch, [...]
> 
> Shouldn't it be possible to have a dummy slurpy, that just /dev/nulls 
> everything? Then you can say "do Parse once to /dev/null to check for 
> errors; Parse again to _darcs/current and ."; but each of those should 
> only take a few minutes (that's how long patch seems to take), rather 
> than doing both at once which takes 20 minutes?
> 
> Having, umm,
> 
> data Slurpy = SlurpDir FileName (Maybe (IO ())) [Slurpy]
>             | SlurpFile FileName Bool (EpochTime,FileOffset)
>             | SlurpDummy
> 
> apply_to_slurpy _ s at SlurpDummy = s
> 
> or something, might work, maybe.

I think you're confusing the parsing with the applying... parsing doesn't
involve slurpies.  However, your suggestion could be used on the applying
side also to increase laziness at the cost of doing so twice.

My idea would be to have three 

readPatchPossible :: PackedString -> Maybe PackedString
readPatchLazily :: PackedString -> Patch
readPatch :: PackedString -> Maybe (Patch, PackedString)

In all cases, the PackedString output is the rest of the PackedString after
the patch.  It may be that we'd need

readPatchLazily :: PackedString -> (Patch, PackedString)

but I'm not sure.  Obviously readPatchLazily would need to throw an error
(with "bug") if the PackedString can't be parsed as a patch.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-devel mailing list