[darcs-devel] Use System.Directory.copyFile for file copying

Jason Dagit dagit at codersbase.com
Fri Jul 20 01:19:42 PDT 2007


On 7/20/07, Salvatore Insalaco <kirby81 at gmail.com> wrote:
> 2007/7/19, Jason Dagit <dagit at codersbase.com>:
> > I'm trying to understand why System.Directory.copyFile would be
> > preferred.  Isn't it implemented in Haskell too?  Isn't bytestring (or
> > rather PackedString) IO heavily optimized and on par with C?
>
> Even if a performance comparison is the best thing to do, I think that
> Kevin's observation is a good one.
> If there's a standard library function to do X, unless there're good
> reasons to not use it (e.g. compatibility with old library versions,
> bad performance), it should be preferred to hand-made solutions.
> If in GHC 6.x (or maybe y.x) they decide to improve it with a
> super-fast-hardware-accelerated-sse-mmx-altivec-parallel algorithm, we
> get the performance for free.

I agree in the abstract, but I also think that we should only change
the existing code (excluding refactorings) if a strong argument can be
made.  For example, proof that the performance is identical (or
better) with the function in System.Directory would be a strong
argument to me.  But, just it's existence is not enough to sway me
(although usually I agree that using idiomatic code, eg. standard
libs, is healthy and good).  Darcs IO has a long history where some
very bright haskell hackers have done their best to squeeze out all
the performance they could.  There is a very real chance that this
hand rolled function is an instance of that.  Perhaps Ian could
comment.  It may also be that by not copying permissions this version
performs much better when you have hundreds or thousands of files to
copy.  Especially if you're just copying them to run a test and then
throw away that copy of the repository.

> Anyway, in GHC 6.6.1, copyFile is implemented in Haskell with raw
> buffered copy, using hGetBuf / hPutBuf.

Well, GHC's copyFile could be changed to use ByteStrings now that
those are bundled with GHC.  That could potentially be a nice upgrade
for the 6.8 release.  Assuming that switching it to buffering with
ByteString makes a performance difference.  One might need to play
around and see if a different buffer size is optimal.

> For the same reason, I think that we should seriously think to switch
> to ByteStrings. Isn't Darcs PackedString just a "prerelease" version
> of ByteStrings? They have a pretty similar API. And now (in 6.7)
> ByteStrings have a
> super-fast-hardware-accelerated-sse-mmx-altivec-parallel algorithms
> for stream fusion (more or less ;) ).

ByteStrings are based on FastPackedStrings (which darcs uses).  I
don't think a single darcs dev is against ByteString per se.  The
things slowing down the adoption are as follows:
1) No one has verified that ByteStrings have the same semantics in all
cases as FastPackedsStrings (handling of line endings is the main
thing to watch out for).
2) FastPackedStrings may have a few utility functions not (currently)
in ByteString (so someone needs to find those and port them over).
3) Darcs aims to support more than just the most recent release of
GHC.  So that means, as long as 6.4 is still supported
FastPackedStrings must stick around even if ByteString is used when
compiling with 6.6 and newer.

There could be more reasons, but I can't think of them at the moment.

I think #3 is the main thing slowing down the adoption.  I've offered
to guide/tutor/mentor various people on #haskell and #darcs for this
very conversion, but currently I think my time is better spent
elsewhere so outside of giving guidance I'm not going to be doing this
conversion till at least January.

Jason


More information about the darcs-devel mailing list