[darcs-users] darcs patch: switch Darcs.Patch.FileName to be ByteString.Char8 int...

Jason Dagit dagit at codersbase.com
Thu Sep 24 02:50:41 UTC 2009

Don't Apply This Yet (DATY)!

Why is this patch in your inbox if it's DATY?  Because I need help
with it of course!  I just hope my request for review doesn't drive
you batty!

My test case is as follows:
0) Get a tar ball of the linux source
1) cd linux; darcs add --recursive *
2) echo q | darcs record -m "import" +RTS -sstderr -p -hc -RTS
3) compare various modifications

(Note: -hc is just one nice profiling option, -hy is also nice but you
can only use one at a time.)

Doing the above test I have discovered that Darcs.Patch.FileName is a
very costly module.  It is costly mainly in terms of space usage.  The
space usage forces the garbage collector to run far too frequently and
this burns up CPU time, allocates a ton of virtual memory, and wastes
siginificant amounts of ram.  On my test machine, the virtual memory
usage is just over 1GB when profiling, and uses 400-500 megs of
physical ram.

Why is it so bad?  There are several reasons.  The main one is our use
of decode_white/encode_white.  These functions operate on strings and
replace white space in filenames with escaped character codes.  For
example, "hello world" becomes "hello\32world".

I'm reasonably confident that the algorithms have been preserved.
Note that the Diff module could make use of futher clean up to use
more ByteStrings and what I have there is really just enough to make
it compile.

What I need help with is the conversion functions like fp2ps, ps2fn
and so on that used encode and unpackPSfromUTF8.  I have a basic
understading of Unicode, UTF-8, bytes, and codepoints, but I'm not
familiar with the usage here and in Darcs so I could use a bit of help
from a careful eye.

PS I'll try to reply to this with data to quantify the
performance gains.

Wed Sep 23 19:13:50 PDT 2009  Jason Dagit <dagit at codersbase.com>
  * switch Darcs.Patch.FileName to be ByteString.Char8 internally
  This switch gives significant performance gains in some use cases,
  such as recording the add of many files simultaneously.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/x-darcs-patch
Size: 31004 bytes
Desc: A darcs patch for your repository!
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20090923/fe8146ed/attachment-0001.bin>

More information about the darcs-users mailing list