[darcs-devel] [issue1731] performance regression in check/repair (2.4.x)
Jason Dagit
bugs at darcs.net
Sat Jan 16 19:08:30 UTC 2010
Jason Dagit <dagitj at gmail.com> added the comment:
For the failed check, when I run that case manually I get this output:
The repository is consistent!
Hash mismatch(es)!
bindisttest/a/b
index: 0000000000000000000000000000000000000000000000000000000000000000
working: 5847c8b50a3191775d91cbaf8029fd1a75725517b0c70dde5bff759fc67ec1f8
bindisttest/a
index: 0000000000000000000000000000000000000000000000000000000000000000
working: e9f80a49790bc926a3de5542274ae979753a3aedaef1b9fab0fcb74a21bc7c60
bindisttest
index: 3c99d780272456ac9e4ab10056ee3e145700260cefee61921bf89017a15a6535
working: 5a72b14295811905378a648e98da91922bbc69f21c1a209357834fc2e92da3f8
Bad index.
The repository was fine, but the index has issues.
As for numbers related to the performance, I ran the above test with profiling enabled and collected these numbers:
Sat Jan 16 01:52 2010 Time and Allocation Profiling Report (Final)
darcs-2.4-beta-prof +RTS -prof -RTS check
total time = 859.54 secs (42977 ticks @ 20 ms)
total alloc = 58,360,459,684 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
sha256 Bundled.SHA256 24.4 0.1
isGZFile ByteStringUtils 20.2 0.1
fsCreateHashedFile Storage.Hashed.Darcs 12.3 0.2
hashedTreeIO Storage.Hashed.Darcs 9.8 4.7
break_before_nth_newline ByteStringUtils 8.5 5.7
floatPath Storage.Hashed.AnchoredPath 4.4 16.8
removeFileMayNotExist Darcs.Lock 3.6 0.0
base16 Storage.Hashed.Hash 3.1 17.0
markChanged Storage.Hashed.Monad 1.1 2.0
modifyTree Storage.Hashed.Tree 0.8 2.6
applyHunkLines Darcs.Patch.Apply 0.7 20.9
break_after_nth_newline ByteStringUtils 0.6 3.5
clean_hashdir Darcs.Repository.HashedIO 0.4 2.0
unlinesPS ByteStringUtils 0.3 1.9
darcsFormatDir Storage.Hashed.Darcs 0.3 1.5
ps2fn Darcs.Patch.FileName 0.3 2.4
sha256 Storage.Hashed.Hash 0.2 1.5
breakup Darcs.Patch.FileName 0.2 1.1
breakFirstPS ByteStringUtils 0.2 1.2
firstspace ByteStringUtils 0.1 1.6
The expensive sha256 appears to be coming from here:
mWriteFilePS_a4ntC Darcs.IO 1624 366704 0.1 0.0 28.8 14.5
writeFile_aB9u Storage.Hashed.Monad 1627 1466813 0.2 0.3 26.6 6.2
maybeFlush Storage.Hashed.Monad 1648 733406 0.2 0.3 0.2 0.3
flush Storage.Hashed.Monad 2079 226 0.0 0.0 0.0 0.0
sha256 Storage.Hashed.Hash 1645 366703 0.0 0.0 23.9 0.1
sha256 Bundled.SHA256 1646 733406 23.9 0.1 23.9 0.1
If I'm reading the profile correctly, that is called from here:
applyAndFix Darcs.Repository.Repair 1392 105474 0.2 0.4 42.7 57.4
Which is in turn called from here:
replayRepository' Darcs.Repository.Repair 1236 28 0.0 0.0 95.1 96.5
Called from here:
replayRepository Darcs.Repository.Repair 1235 2 0.0 0.0 99.3 98.9
And I think the entry point starts here:
main Main 936 5 0.0 0.0 99.8 99.3
assertions Main 3030 1 0.0 0.0 0.0 0.0
run_the_command Darcs.RunCommand 942 1 0.0 0.0 99.8 99.3
run_command Darcs.RunCommand 1004 1 0.0 0.0 99.8 99.3
consider_running Darcs.RunCommand 1006 3 0.0 0.0 99.8 99.3
doCRCWarnings Darcs.Commands.GZCRCs 3032 1 0.0 0.0 0.0 0.0
resetCRCWarnings Darcs.Global 3035 1 0.0 0.0 0.0 0.0
getCRCWarnings Darcs.Global 3033 1 0.0 0.0 0.0 0.0
checkCmd Darcs.Commands.Check 1153 1 0.0 0.0 99.8 99.3
check' Darcs.Commands.Check 1234 13 0.0 0.0 99.8 99.3
Compiling those lists, snipping out the unrelated bits, and doctoring the whitespace:
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
main Main 936 5 0.0 0.0 99.8 99.3
run_the_command Darcs.RunCommand 942 1 0.0 0.0 99.8 99.3
run_command Darcs.RunCommand 1004 1 0.0 0.0 99.8 99.3
consider_running Darcs.RunCommand 1006 3 0.0 0.0 99.8 99.3
checkCmd Darcs.Commands.Check 1153 1 0.0 0.0 99.8 99.3
check' Darcs.Commands.Check 1234 13 0.0 0.0 99.8 99.3
replayRepository Darcs.Repository.Repair 1235 2 0.0 0.0 99.3 98.9
replayRepository' Darcs.Repository.Repair 1236 28 0.0 0.0 95.1 96.5
applyAndFix Darcs.Repository.Repair 1392 105474 0.2 0.4 42.7 57.4
mWriteFilePS_a4ntC Darcs.IO 1624 366704 0.1 0.0 28.8 14.5
writeFile_aB9u Storage.Hashed.Monad 1627 1466813 0.2 0.3 26.6 6.2
sha256 Storage.Hashed.Hash 1645 366703 0.0 0.0 23.9 0.1
sha256 Bundled.SHA256 1646 733406 23.9 0.1 23.9 0.1
It seems like sha256 is incriminated simply because it's called so many times. Which means it's probably worthwhile to look elsewhere for
performance gains.
Another interesting bit, is that everything I snipped out between replayRepository' and applyAndFix contributes 0.0% to the the individual and
inherited times, yet when we get to applyAndFix only half the time is spent at or below applyAndFix. I think that means the rest of the time
is spent mainly in isGZFile, fsCreateHashedFile, and hashedTreeIO (glancing at the profiling output also makes me think this).
I have no idea how to optimize this, but hopefully these numbers are insightful to someone else. I'm also attaching the full profiling output
in case someone wants to see it in its entirety.
Jason
----------
nosy: +dagit
__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/issue1731>
__________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: darcs-2.4-beta-prof.prof
Type: application/octet-stream
Size: 273613 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-devel/attachments/20100116/81c70feb/attachment-0001.obj>
More information about the darcs-devel
mailing list