[darcs-users] darcs patch: Optimize clean_hashdir's use of cleanCaches.

David Roundy droundy at darcs.net
Wed Oct 8 14:21:30 UTC 2008


On Tue, Oct 07, 2008 at 09:18:58PM +0200, Petr Rockai wrote:
> I am attaching a first shot at indirectly optimizing hashed repair: the patch
> should drastically limit amount of work wasted on cleaning the caches by
> passing a list of files that might need to be removed to cleanCaches, which
> then only looks at that, instead of skimming through everything in the global
> cache. This should save some cycles, although I have no numbers to back that
> up. Moreover, the patch as it is is largely untested (just a make check and a
> quick test that the idea helps things), but I am in a hurry (again), so please
> review carefully. I will try to revisit this soon.
> 
> (Next comes trimming down memory usage, we have discussed that at length on IRC
> today with Kowey. I'll investigate, hopefully in the course of this week.)
> 
> Yours,
>    Petr.
> 
> Tue Oct  7 21:12:37 CEST 2008  Petr Rockai <me at mornfall.net>
>   * Optimize clean_hashdir's use of cleanCaches.
>   
>   We now only ask cleanCaches to look at files we have unlinked and therefore
>   might have caused their cached equivalents to drop link-count to 1. Limits
>   number of stats to O(n), n = number of cleaned out files.

Applied, thanks! It does look correct.

> hunk ./src/Darcs/Repository/HashedIO.lhs 406
> -      cleanCaches c dir_
> +      cleanCachesWithHint c dir_ (fs \\ hs)

I think a better optimization to do next would be to remove the two
calls to \\, which causes clean_hashdir to be O(N^2) where N is the
total number of files in the hashdir.  Particularly as your patch
doubles the prefactor!  An implementation using Data.Set would be
reasonable.  This may not be too bad, as the prefactor on calls to
stat is much greater than haskell string comparison, but still
eliminating something that scales as O(N^2) is well worth the effort.

David


More information about the darcs-users mailing list