[darcs-users] darcs patch: switch Darcs.Patch.FileName to be ByteString.Char8 int...
reinier.lamers at gmail.com
Thu Sep 24 08:13:53 UTC 2009
2009/9/24 Jason Dagit <dagit at codersbase.com>:
> Doing the above test I have discovered that Darcs.Patch.FileName is a
> very costly module. It is costly mainly in terms of space usage. The
> space usage forces the garbage collector to run far too frequently and
> this burns up CPU time, allocates a ton of virtual memory, and wastes
> siginificant amounts of ram. On my test machine, the virtual memory
> usage is just over 1GB when profiling, and uses 400-500 megs of
> physical ram.
Great that you do this research! If we keep up the current pace of
performance hacking, darcs will complete before you even hit the enter
key in a few years :-)
> What I need help with is the conversion functions like fp2ps, ps2fn
> and so on that used encode and unpackPSfromUTF8. I have a basic
> understading of Unicode, UTF-8, bytes, and codepoints, but I'm not
> familiar with the usage here and in Darcs so I could use a bit of help
> from a careful eye.
I have no time to look at the code now, but be aware that filenames on
Unix are not really character strings. Filenames are usually chosen to
be strings of bytes that are meaningful in the text encoding of the
user, but the system calls will happily operate on any sequence of
bytes. When darcs stores filenames internally as character strings, it
becomes unable to operate on files whose name is not a valid byte
sequence for darcs's text encoding.
As usual, Windows and Mac OS X have their own ways of dealing with filenames.
More information about the darcs-users