[darcs-devel] Use System.Directory.copyFile for file copying

Kevin Quick quick at sparq.org
Wed Aug 1 06:19:51 PDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks!  I had actually made the original change based on the preference
for system libraries over custom code (lacking any comments  
justifying the
preference for the latter) but it's nice to see it comes with a  
potential performance
boost for certain configurations.

Out of curiosity, and at the risk of boring folks, I added a third  
method for comparison,
which performs a System.Process.runCommand("cp -r fromdir todir") >>=  
waitForProcess,
essentially testing against standard cp plus the overhead of the  
subprocess creation.

In addition, I also ran this on Mac OS X Tiger x86.

Updated results for Linux x86 reiserfs:

$  for X in 1 2 3 4 5 6 ; do rm -r copytestdir/small{2,3,4}/*; ./ 
copytest copytestdir/small?; done
Copy of 998 files:
    via System.Directory.copyFile: 0.172032s
    via readFilePS >= writeFilePS: 0.164614s
    via   System.Process("cp -r"): 0.115983s
Copy of 998 files:
    via System.Directory.copyFile: 0.180447s
    via readFilePS >= writeFilePS: 0.176695s
    via   System.Process("cp -r"): 0.120259s
Copy of 998 files:
    via System.Directory.copyFile: 0.176865s
    via readFilePS >= writeFilePS: 0.165542s
    via   System.Process("cp -r"): 0.128305s
Copy of 998 files:
    via System.Directory.copyFile: 0.183365s
    via readFilePS >= writeFilePS: 0.176855s
    via   System.Process("cp -r"): 0.126601s
Copy of 998 files:
    via System.Directory.copyFile: 0.183976s
    via readFilePS >= writeFilePS: 0.17487s
    via   System.Process("cp -r"): 0.125522s
Copy of 998 files:
    via System.Directory.copyFile: 0.178875s
    via readFilePS >= writeFilePS: 0.171737s
    via   System.Process("cp -r"): 0.127577s

$  for X in 1 2 3 4 5 6 ; do rm -r copytestdir/big{2,3,4}/*; /kwq/ 
unstable/copytest copytestdir/big?; done
Copy of 4 files:
    via System.Directory.copyFile: 8.204959s
    via readFilePS >= writeFilePS: 11.083989s
    via   System.Process("cp -r"): 13.116506s
Copy of 4 files:
    via System.Directory.copyFile: 7.864635s
    via readFilePS >= writeFilePS: 9.829027s
    via   System.Process("cp -r"): 11.634171s
Copy of 4 files:
    via System.Directory.copyFile: 7.08999s
    via readFilePS >= writeFilePS: 10.456167s
    via   System.Process("cp -r"): 15.665767s
Copy of 4 files:
    via System.Directory.copyFile: 8.318283s
    via readFilePS >= writeFilePS: 10.496049s
    via   System.Process("cp -r"): 10.557133s
Copy of 4 files:
    via System.Directory.copyFile: 5.262135s
    via readFilePS >= writeFilePS: 13.083001s
    via   System.Process("cp -r"): 11.247626s
Copy of 4 files:
    via System.Directory.copyFile: 7.754436s
    via readFilePS >= writeFilePS: 10.117483s
    via   System.Process("cp -r"): 13.9466s

The results for Mac OS X Tiger x86:

$ for X in 1 2 3 4 5 6 ; do rm -r copytestdir/small{2,3,4}/*; ./ 
copytest copytestdir/small?; done
Copy of 998 files:
    via System.Directory.copyFile: 0.455989s
    via readFilePS >= writeFilePS: 0.435122s
    via   System.Process("cp -r"): 0.25074s
Copy of 998 files:
    via System.Directory.copyFile: 0.353156s
    via readFilePS >= writeFilePS: 0.367436s
    via   System.Process("cp -r"): 0.250192s
Copy of 998 files:
    via System.Directory.copyFile: 0.36179s
    via readFilePS >= writeFilePS: 0.450544s
    via   System.Process("cp -r"): 0.249152s
Copy of 998 files:
    via System.Directory.copyFile: 0.353648s
    via readFilePS >= writeFilePS: 0.366946s
    via   System.Process("cp -r"): 0.250617s
Copy of 998 files:
    via System.Directory.copyFile: 0.354281s
    via readFilePS >= writeFilePS: 0.417799s
    via   System.Process("cp -r"): 0.256529s
Copy of 998 files:
    via System.Directory.copyFile: 0.390028s
    via readFilePS >= writeFilePS: 0.363963s
    via   System.Process("cp -r"): 0.248796s

$ for X in 1 2 3 4 5 6 ; do rm -r copytestdir/big{2,3,4}/*; ./ 
copytest copytestdir/big?; done
Copy of 4 files:
    via System.Directory.copyFile: 4.419766s
    via readFilePS >= writeFilePS: 1.27034s
    via   System.Process("cp -r"): 1.649713s
Copy of 4 files:
    via System.Directory.copyFile: 1.429918s
    via readFilePS >= writeFilePS: 1.40437s
    via   System.Process("cp -r"): 1.456039s
Copy of 4 files:
    via System.Directory.copyFile: 1.552916s
    via readFilePS >= writeFilePS: 1.219946s
    via   System.Process("cp -r"): 1.465697s
Copy of 4 files:
    via System.Directory.copyFile: 1.426121s
    via readFilePS >= writeFilePS: 1.32494s
    via   System.Process("cp -r"): 0.985613s
Copy of 4 files:
    via System.Directory.copyFile: 1.429138s
    via readFilePS >= writeFilePS: 1.319917s
    via   System.Process("cp -r"): 1.40475s
Copy of 4 files:
    via System.Directory.copyFile: 1.428278s
    via readFilePS >= writeFilePS: 1.471474s
    via   System.Process("cp -r"): 1.425157s


There's clearly some variability in the results that could be  
investigated, but I'm not sure it's worth the effort trying to fine- 
tune the benchmark since we're after "typical" results.  The copyFile  
never seems to be significantly slower, and can be significantly faster.

- -KQ


On 31 Jul 2007, at 10:46 AM, Jason Dagit wrote:

> Fantastic analysis.  Exactly what I was looking for.  I'm now  
> convinced :)
>
> The icing on the cake would be to check this on various platforms but
> I'm convinced enough already.
>
> Thanks!
> Jason
>
> On 7/30/07, Kevin Quick <quick at sparq.org> wrote:
>> On Sat, 28 Jul 2007 17:23:13 -0700, "Jason Dagit"
>> <dagit at codersbase.com> wrote:
>>> I think it would be nice to see a benchmark between the old and new
>>> where there are thousands of tiny little files.  Is there a  
>>> noticeable
>>> performance difference?  The reason for lots of little files is
>>> because of the permission copying.  I don't know that it would  
>>> affect
>>> the performance, but if the permission changes require a lot of
>>> function calls then it could have a big impact.
>>>
>>> Jason
>>
>> Attached is copytest.hs.  It is given three directory names and
>> copies from the first to the second using System.Directory.copyFile,
>> then copies from the first to the third using readfilePS >>=
>> writeFilePS.  It then reports the elapsed time for each test.
>>
>> To build (assuming it is located in the top-level of the darcs source
>> tree):
>>
>> $ ghc --make -isrc copytest src/fpstring.c -lz
>>
>> I used two input sets.  The first directory set had 998 small  
>> files (50
>> bytes to 500 bytes) in the source directory.  The second set had 4  
>> big
>> files:
>>
>> $ ls -lh copytestdir/big1
>> total 83M
>> -rw-rw-r-- 1 kquick kquick 16M Jul 30 22:15 bigfile1
>> -rw-rw-r-- 1 kquick kquick 14M Jul 30 22:15 bigfile2
>> -rw-rw-r-- 1 kquick kquick 30M Jul 30 22:15 bigfile3
>> -rw-rw-r-- 1 kquick kquick 25M Jul 30 22:16 bigfile4
>>
>> Test runs:
>>
>> $ for X in 1 2 3 4 5 6 ; do rm copytestdir/small{2,3}/*; ./copytest
>> copytestdir/small?; done
>> Copy of 998 files:
>>    via System.Directory.copyFile: 0.160895s
>>    via readFilePS >= writeFilePS: 0.153626s
>> Copy of 998 files:
>>    via System.Directory.copyFile: 0.161436s
>>    via readFilePS >= writeFilePS: 0.153718s
>> Copy of 998 files:
>>    via System.Directory.copyFile: 0.163191s
>>    via readFilePS >= writeFilePS: 0.155526s
>> Copy of 998 files:
>>    via System.Directory.copyFile: 0.16112s
>>    via readFilePS >= writeFilePS: 0.156255s
>> Copy of 998 files:
>>    via System.Directory.copyFile: 0.162132s
>>    via readFilePS >= writeFilePS: 0.157913s
>> Copy of 998 files:
>>    via System.Directory.copyFile: 0.163213s
>>    via readFilePS >= writeFilePS: 0.157451s
>> $
>>
>>
>> $ for X in 1 2 3 4 5 6 ; do rm copytestdir/big{2,3}/*; ./copytest
>> copytestdir/ big?; done
>> Copy of 4 files:
>>    via System.Directory.copyFile: 10.418745s
>>    via readFilePS >= writeFilePS: 11.420843s
>> Copy of 4 files:
>>    via System.Directory.copyFile: 8.318079s
>>    via readFilePS >= writeFilePS: 16.533595s
>> Copy of 4 files:
>>    via System.Directory.copyFile: 8.384256s
>>    via readFilePS >= writeFilePS: 11.35574s
>> Copy of 4 files:
>>    via System.Directory.copyFile: 7.752898s
>>    via readFilePS >= writeFilePS: 14.43615s
>> Copy of 4 files:
>>    via System.Directory.copyFile: 8.029765s
>>    via readFilePS >= writeFilePS: 14.187116s
>> Copy of 4 files:
>>    via System.Directory.copyFile: 7.85273s
>>    via readFilePS >= writeFilePS: 12.406907s
>>
>>
>>> From the above, I conclude that System.Directory.copyFile is  
>>> better at
>> actually copying the file data, and that the overhead of copying
>> permissions (visible from copying small files) is quite small  
>> (about 8
>> us/file).
>>
>>
>> --
>> --
>> Kevin Quick
>> quick at org after sparq
>>
>> _______________________________________________
>> darcs-devel mailing list
>> darcs-devel at darcs.net
>> http://lists.osuosl.org/mailman/listinfo/darcs-devel
>>
>>
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFGsIh4t76lKrRL0ewRAgtPAJ9l+dByfhH3gy+xazwAVX9n887z3QCaAjHa
CAdzHVaUoihXE4Csd/g9QaI=
=2f27
-----END PGP SIGNATURE-----


More information about the darcs-devel mailing list