[darcs-devel] DARCS for Windows international development
David Roundy
droundy at darcs.net
Wed May 30 15:39:13 PDT 2007
On Wed, May 30, 2007 at 02:39:47PM -0700, Paul Schauble wrote:
> Actually, no. If you just took the CR without taking the other byte of
> the 16 bit character, then the byte-to-character phase would be wrong
> for the following line. You have to actually handle the data as 16 bit
> characters.
>
> That's why I'm wondering if the underlying Haskell system can handle 16
> bit characters. If not, then how could this change be made?
>
> BTW, the usual problem with programs handling UTF-16 is the null
> characters contained within strings. This usually doesn't work out well
> unless the underlying language handles wide characters.
We don't use Haskell Char's for file data in darcs, it's just raw bytes.
And only two (well maybe a couple more, counting the marking of conflicts)
functions would be needed to deal with UTF-16, breaking into lines and
concating the lines together (linesPS and unlinesPS). It'd still be pretty
easy to add support for UTF-16. The hard work would all be in the options
(similar to binary handling) to allow users to specify which line-breaking
they want.
Null characters are no problem, as we don't use C strings.
Haskell does use 32 bit characters for its Char type, darcs just doesn't
use this type. It's a waste to convert from 8 bit to 32 bit and back
again, as I'm sure you'd imagine.
--
David Roundy
Department of Physics
Oregon State University
More information about the darcs-devel
mailing list