[darcs-devel] [issue1731] performance regression in check/repair (2.4.x)

Jason Dagit bugs at darcs.net
Sat Jan 16 19:08:30 UTC 2010


Jason Dagit <dagitj at gmail.com> added the comment:

For the failed check, when I run that case manually I get this output:
The repository is consistent!                                                   
Hash mismatch(es)!                                                              
bindisttest/a/b
    index: 0000000000000000000000000000000000000000000000000000000000000000
  working: 5847c8b50a3191775d91cbaf8029fd1a75725517b0c70dde5bff759fc67ec1f8
bindisttest/a
    index: 0000000000000000000000000000000000000000000000000000000000000000
  working: e9f80a49790bc926a3de5542274ae979753a3aedaef1b9fab0fcb74a21bc7c60
bindisttest
    index: 3c99d780272456ac9e4ab10056ee3e145700260cefee61921bf89017a15a6535
  working: 5a72b14295811905378a648e98da91922bbc69f21c1a209357834fc2e92da3f8

Bad index.


The repository was fine, but the index has issues.

As for numbers related to the performance, I ran the above test with profiling enabled and collected these numbers:
        Sat Jan 16 01:52 2010 Time and Allocation Profiling Report  (Final)

           darcs-2.4-beta-prof +RTS -prof -RTS check

        total time  =      859.54 secs   (42977 ticks @ 20 ms)
        total alloc = 58,360,459,684 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

sha256                         Bundled.SHA256        24.4    0.1
isGZFile                       ByteStringUtils       20.2    0.1
fsCreateHashedFile             Storage.Hashed.Darcs  12.3    0.2
hashedTreeIO                   Storage.Hashed.Darcs   9.8    4.7
break_before_nth_newline       ByteStringUtils        8.5    5.7
floatPath                      Storage.Hashed.AnchoredPath   4.4   16.8
removeFileMayNotExist          Darcs.Lock             3.6    0.0
base16                         Storage.Hashed.Hash    3.1   17.0
markChanged                    Storage.Hashed.Monad   1.1    2.0
modifyTree                     Storage.Hashed.Tree    0.8    2.6
applyHunkLines                 Darcs.Patch.Apply      0.7   20.9
break_after_nth_newline        ByteStringUtils        0.6    3.5
clean_hashdir                  Darcs.Repository.HashedIO   0.4    2.0
unlinesPS                      ByteStringUtils        0.3    1.9
darcsFormatDir                 Storage.Hashed.Darcs   0.3    1.5
ps2fn                          Darcs.Patch.FileName   0.3    2.4
sha256                         Storage.Hashed.Hash    0.2    1.5
breakup                        Darcs.Patch.FileName   0.2    1.1
breakFirstPS                   ByteStringUtils        0.2    1.2
firstspace                     ByteStringUtils        0.1    1.6

The expensive sha256 appears to be coming from here:
          mWriteFilePS_a4ntC Darcs.IO                                            1624      366704   0.1    0.0    28.8   14.5
           writeFile_aB9u Storage.Hashed.Monad                                1627     1466813   0.2    0.3    26.6    6.2
            maybeFlush   Storage.Hashed.Monad                                1648      733406   0.2    0.3     0.2    0.3
             flush       Storage.Hashed.Monad                                2079         226   0.0    0.0     0.0    0.0
            sha256       Storage.Hashed.Hash                                 1645      366703   0.0    0.0    23.9    0.1
             sha256      Bundled.SHA256                                      1646      733406  23.9    0.1    23.9    0.1

If I'm reading the profile correctly, that is called from here:
         applyAndFix     Darcs.Repository.Repair                             1392      105474   0.2    0.4    42.7   57.4

Which is in turn called from here:
        replayRepository' Darcs.Repository.Repair                             1236          28   0.0    0.0    95.1   96.5

Called from here:
       replayRepository  Darcs.Repository.Repair                             1235           2   0.0    0.0    99.3   98.9

And I think the entry point starts here:
 main                    Main                                                 936           5   0.0    0.0    99.8   99.3
  assertions             Main                                                3030           1   0.0    0.0     0.0    0.0
  run_the_command        Darcs.RunCommand                                     942           1   0.0    0.0    99.8   99.3
   run_command           Darcs.RunCommand                                    1004           1   0.0    0.0    99.8   99.3
    consider_running     Darcs.RunCommand                                    1006           3   0.0    0.0    99.8   99.3
     doCRCWarnings       Darcs.Commands.GZCRCs                               3032           1   0.0    0.0     0.0    0.0
      resetCRCWarnings   Darcs.Global                                        3035           1   0.0    0.0     0.0    0.0
      getCRCWarnings     Darcs.Global                                        3033           1   0.0    0.0     0.0    0.0
     checkCmd            Darcs.Commands.Check                                1153           1   0.0    0.0    99.8   99.3
      check'             Darcs.Commands.Check                                1234          13   0.0    0.0    99.8   99.3


Compiling those lists, snipping out the unrelated bits, and doctoring the whitespace:
                                                                                               individual    inherited
COST CENTRE              MODULE                                               no.    entries  %time %alloc   %time %alloc
 main                    Main                                                 936           5   0.0    0.0    99.8   99.3
  run_the_command        Darcs.RunCommand                                     942           1   0.0    0.0    99.8   99.3
   run_command           Darcs.RunCommand                                    1004           1   0.0    0.0    99.8   99.3
    consider_running     Darcs.RunCommand                                    1006           3   0.0    0.0    99.8   99.3
     checkCmd            Darcs.Commands.Check                                1153           1   0.0    0.0    99.8   99.3
      check'             Darcs.Commands.Check                                1234          13   0.0    0.0    99.8   99.3
       replayRepository  Darcs.Repository.Repair                             1235           2   0.0    0.0    99.3   98.9
        replayRepository' Darcs.Repository.Repair                            1236          28   0.0    0.0    95.1   96.5
         applyAndFix     Darcs.Repository.Repair                             1392      105474   0.2    0.4    42.7   57.4
          mWriteFilePS_a4ntC Darcs.IO                                        1624      366704   0.1    0.0    28.8   14.5
           writeFile_aB9u Storage.Hashed.Monad                               1627     1466813   0.2    0.3    26.6    6.2
            sha256       Storage.Hashed.Hash                                 1645      366703   0.0    0.0    23.9    0.1
             sha256      Bundled.SHA256                                      1646      733406  23.9    0.1    23.9    0.1

It seems like sha256 is incriminated simply because it's called so many times.  Which means it's probably worthwhile to look elsewhere for 
performance gains.

Another interesting bit, is that everything I snipped out between replayRepository' and applyAndFix contributes 0.0% to the the individual and 
inherited times, yet when we get to applyAndFix only half the time is spent at or below applyAndFix.  I think that means the rest of the time 
is spent mainly in isGZFile, fsCreateHashedFile, and hashedTreeIO (glancing at the profiling output also makes me think this).

I have no idea how to optimize this, but hopefully these numbers are insightful to someone else.  I'm also attaching the full profiling output 
in case someone wants to see it in its entirety.

Jason

----------
nosy: +dagit

__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/issue1731>
__________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: darcs-2.4-beta-prof.prof
Type: application/octet-stream
Size: 273613 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-devel/attachments/20100116/81c70feb/attachment-0001.obj>


More information about the darcs-devel mailing list