[darcs-devel] darcs patch: Implement gzReadFile (and 19 more)

David Roundy droundy at abridgegame.org
Sat Apr 16 15:14:31 PDT 2005


Thanks, Ian!

On Sat, Apr 16, 2005 at 04:31:39PM -0400, Ian Lynagh wrote:
> 
> I think this should all be safe.
> 
> The getSymbolicLinkStatus stuff might need a bit of tweaking on Windows,
> but as we were already doing getFileStatus it should be easy enough
> (just getSymbolicLinkStatus = getFileStatus and isSymbolicLink = const
> False I imagine).

I agree.  That sounds like a fine plan to me.

> Some graphs and whatnot here:
> http://urchin.earth.li/~ian/darcs/res/res.html

I get bad timing results, and for some reason I don't get nearly as nice
results on the memory usage as what your profiling suggests.

Your latest patchset:

$ time ../darcs_ian record -a -m foo -l
real    11m59.270s
user    11m9.980s
sys     0m40.010s

(Took 492m memory)

$ time ../darcs_ian whatsnew
No changes!

real    0m1.181s
user    0m0.780s
sys     0m0.340s
$ time ../darcs_ian whatsnew -l
No changes!

real    0m10.404s
user    0m8.250s
sys     0m0.920s

(a later call to whatsnew -l took over 470m memory and 2.5 minutes before I
interrupted it--I suspect this is due to the lack of sorting.  See below.)

Current darcs-unstable:

$ time ../darcs_darcs record -a -m foo -l
real    6m3.650s
user    5m19.900s
sys     0m35.880s

(Took 700m memory)
$ time ../darcs_darcs whatsnew
No changes!

real    0m5.687s
user    0m4.290s
sys     0m0.950s
$ time ../darcs_darcs whatsnew -l
No changes!

real    0m19.197s
user    0m15.140s
sys     0m2.460s

To summarize, your record takes twice as long as the current
darcs-unstable, which is definitely not good.  It *seems* (just from
staring at it from time to time) to be spending its extra time in "Appying
to current...", which suggests to me that perhaps the lazy IO patch reading
was the culprit after all.  It may be that I ran the wrong version when I
tested that stuff before and pronounced it fast...

I tried doing a record using your code with compression turned off, and got
the following result (with peak memory usage of 700m or so):

$ echo ALL --dont-compress >> _darcs/prefs/defaults
$ time ../darcs_ian record -a -m foo -l
real    7m17.628s
user    5m39.600s
sys     0m30.780s

Your record does take less memory, partly because the darcs-unstable
version takes more than I remembered.  I may have missed the sync spike
previously, using my sophisticated "stare at top" profiling strategem.

Your whatsnew without the -l is wonderfully fast--I'm curious as to what
caused this.  For some reason, it is still much slower when -l is given.
Whatsnew -l isn't a particularly speed-critical command, but this may give
us a hint to an optimization that would help other commands also.


I'm taking a few of your changes, but am concerned about others of them, so
I'm not accepting them right now.  I'm still leery of the gzReadFile and
the bytewise patch parsing code, and am afraid it slowed down the record
significantly, so I've not accepted those patches.

You can do a send (once I've actually gotten this into darcs-unstable) to
see which patches I didn't accept, so I'll just leave here the ones I have
accepted.

> Fri Apr 15 18:39:22 BST 2005  Ian Lynagh <igloo at earth.li>
>   * Use strings for newlines as hPutBuf doesn't do the right thing when printing them
> 
> Sat Apr 16 00:12:50 BST 2005  Ian Lynagh <igloo at earth.li>
>   * Tweak Diff.lhs
> 
> Sat Apr 16 00:13:36 BST 2005  Ian Lynagh <igloo at earth.li>
>   * Tweak SlurpDirectory
> 
> Sat Apr 16 00:56:48 BST 2005  Ian Lynagh <igloo at earth.li>
>   * Use getSymbolicLinkStatus to save on stat calls
> 
> Sat Apr 16 01:25:12 BST 2005  Ian Lynagh <igloo at earth.li>
>   * Use getSymbolicLinkStatus more
> 
> Sat Apr 16 02:17:42 BST 2005  Ian Lynagh <igloo at earth.li>
>   * Keep filenames reversed until we need them
> 
> Sat Apr 16 03:58:02 BST 2005  Ian Lynagh <igloo at earth.li>
>   * More tweaks
> 
> Sat Apr 16 17:06:51 BST 2005  Ian Lynagh <igloo at earth.li>
>   * More tweaking
> 
> Sat Apr 16 18:04:19 BST 2005  Ian Lynagh <igloo at earth.li>
>   * Avoid submerge_in_dir when diffing
> 
> Sat Apr 16 18:05:00 BST 2005  Ian Lynagh <igloo at earth.li>
>   * Avoid repeated work with norm_path/drop_dotdot
> 
> Sat Apr 16 19:03:06 BST 2005  Ian Lynagh <igloo at earth.li>
>   * Avoid nested list appends when diffing

(Now for specific change comments... which were written before the above
discussion...)

> [Use strings for newlines as hPutBuf doesn't do the right thing when printing them
> Ian Lynagh <igloo at earth.li>**20050415173922] {
> hunk ./Printer.lhs 62
> - -newline_p = Both "\n" (packString "\n")
> +newline_p = S "\n"
> hunk ./Printer.lhs 66
> - -newline   = unsafeBoth "\n" (packString "\n")
> +newline   = unsafeChar '\n'
> }

What is it that hPutBuf is doing wrong when we try use it to print a
newline?

> [Tweak Diff.lhs
> Ian Lynagh <igloo at earth.li>**20050415231250] {

It looks here like this is just a cleanup, switching to return a list
rather than a Maybe list.  Is there more to it than that? I suspect there
is a deep laziness reason that I haven't seen...

> [Keep filenames reversed until we need them
> Ian Lynagh <igloo at earth.li>**20050416011742] {

Clever! Did it make much difference?

> [Don't sort directory contents
> Ian Lynagh <igloo at earth.li>**20050416030619] {
> hunk ./SlurpDirectory.lhs 57
> - -import List ( sort, tails, isPrefixOf )
> +import List ( tails, isPrefixOf )
> hunk ./SlurpDirectory.lhs 232
> - -get_dircontents (SlurpDir _ _ c) = sort c
> +get_dircontents (SlurpDir _ _ c) = c
> }

I'm concerned here...  Don't we assume that the directories are sorted in
recur_diff? It may be that getDirectoryContents returns a sorted list, but
I don't think there's a guarantee of this anywhere.  Also, when we modify a
slurpy there's no guarantee that the modified Slurpy will be sorted.

We could move the sort into Diff, but since diff is the only module that
calls get_dircontents, I don't think we'll have gained anything.

So it seems to me that if we keep this patch, we need to add some code to
ensure that Slurpies are always sorted.  :(

> [Don't put SCCs in Record or SelectChanges
> Ian Lynagh <igloo at earth.li>**20050416160628] {
> hunk ./GNUmakefile 15
> +GHCFLAGS_Record.o = $(filter-out -auto-all,$(GHCFLAGS))
> +GHCFLAGS_SelectChanges.o = $(filter-out -auto-all,$(GHCFLAGS))
> }

Why is this helpful? Just curious...

> hunk ./SlurpDirectory.lhs 640
> - -  do isdir <- doesDirectoryReallyExist $ fn2fp d
> - -     unless isdir (createDirectory $ fn2fp d)
> - -     withCurrentDirectory (fn2fp d) $
> - -       do runpatch p
> - -          sequence_ $ map (slurp_write opts) ss
> +  do let d' = fn2fp d
> +     efs <- try $ getSymbolicLinkStatus d'
> +     case efs of
> +         Left _ -> createDirectory d'
> +         Right fs
> +          | isDirectory fs || (isSymbolicLink fs && d' == ".") -> return ()
> +          | otherwise -> fail ("Want to create directory " ++ d')
> +     withCurrentDirectory d' $
> +         do runpatch p
> +            mapM_ (slurp_write opts) ss

This error message "Want to create directory foo" looks a bit vague.

Also, I'm not quite sure why you return without failure when the directory
we're trying to create already exists, but is a symbolic link, provided it
is "."... what does that mean?  How can "." be a symbolic link?

> hunk ./SlurpDirectory.lhs 652
> - -       when (not dirt) $ setModTime f mt
> +       unless dirt $ setModTime f mt

Ah, those long passed days when I was familiar with 'when', but not 'unless'...
-- 
David Roundy
http://www.darcs.net




More information about the darcs-devel mailing list