[darcs-devel] Re: optimizing darcs diff to be 4000 times faster

Mark Stosberg mark at summersault.com
Mon Dec 27 17:04:24 PST 2004


On 2004-12-27, David Roundy <droundy at abridgegame.org> wrote:
>
> The problem is that this only works if the arguments given are all files.
> If they are directories, we need to either recurse the directories
> ourselves, or create temporary directories.  

If recursing the directories directly is faster than using temporary
directories, I'd like to advocate for that. :) 

> Otherwise you could end up diffing a bunch of files that exist only in
> the working directories, and you'd have a bunch of
>
> Only in .: foo.o
>
> which you don't want.
>
> The other issue is that this is a special case for when diff isn't
> being run on old versions of the repository, and I haven't written any
> code for that special case.

Well certainly getting the right answer later is better getting the
wrong answer sooner. :)  My view is that this particular case is common
enough, and the potential optimization is great enough, that the special
case is worthwhile.

I think I found one fairly easy related optimization, but I don't have
the Haskell skills to follow through.

Around like 158 in DiffCommand.lhs, there is this code:

    morepatches <- read_repo formerdir
    putDocLn $ changelog (get_diff_info opts morepatches)
            $$ thediff

If I understand, here the ChangeLog is printed above the diff, which
requires calling 'read_repo'. However,printing the changelog (and thus
calling 'read_repo') does not need to happen if you are just seeing a
'whatsnew' type of of diff.

I think the logic should be: 

"Only print the changelog if there have any options passed besides 
--diff-opts OR --unified  

    Mark

-- 
http://mark.stosberg.com/ 





More information about the darcs-devel mailing list