[darcs-users] high-level UTF-8 feedback

Eric Kow kowey at darcs.net
Wed Dec 23 17:40:49 UTC 2009

[Re-sent as I forgot to send the original to the list]

Hi Reinier,

We were just idly chatting in #darcs this morning when the new UTF-8
work (which just made it in, BTW, yay!) came up.

As our resident Unicode gurus, I thought you and Juliusz would be
interested in the conversation.  Given that you've thought about and
discussed this so much, it may all be old territory.  However, here
is the conversation anyway in case there is new stuff in for you to


Juliusz: no requests on my part.  I thought you may want to see it
because there is an argument for going back to tagging new patches,
but don't feel obliged to weigh in if you're busy :-)

Summary for the impatient
The main principles here are that types are good and so is a
canonical representation.  -- Duncan

1. A type for when we know we're dealing with an internal representation of
   Unicode chars is a good idea, even just a ByteString newtype

2. We should make sure that there are not any hidden Unicode -> locale ->
   Unicode trips lurking in the code
   [and that it's strictly Unicode -> locale; or locale -> Unicode at most]

3. We should make sure that our Unicode -> locale is merely lossy and not

4. We should watch out for non-NFC UTF-8 patches.
   These should be treated as old-style.

5. Despite our confidence about UTF-8 detection, tagging is still a good idea
   [a] for sheer conservatism and [b] for potential efficiency gains [c] 
   because it lets us reliably distinguish NFC UTF-8 vs the other one.

Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20091223/134e2f4c/attachment.pgp>

More information about the darcs-users mailing list