[darcs-devel] DARCS for Windows international development

Paul Schauble Paul.Schauble at ticketmaster.com
Thu May 31 15:18:12 PDT 2007


Well, as long as we're being pedantic...

It's a Byte Order Mark when it is the first two bytes in the file. It's
a  ZERO WIDTH NO-BREAK SPACE in any other position. 

In any case, in the Windows environment files without a BOM are rare.
How do Linux system format Unicode files?

    ++PLS

-----Original Message-----
From: darcs-devel-bounces at darcs.net
[mailto:darcs-devel-bounces at darcs.net] On Behalf Of Stephen J. Turnbull
Sent: Wednesday, May 30, 2007 6:17 PM
To: darcs-devel at darcs.net
Subject: RE: [darcs-devel] DARCS for Windows international development

Paul Schauble writes:

 > That last is exactly correct. The Byte Order Mark also identifies the
 > file as Unicode. It provides a unique signature for UTF-8, UTF-16LE,
and
 > UTF-16BE files.

If people would call the character by its name -- "ZERO WIDTH NO-BREAK
SPACE" -- there would be less confusion.  It's a character in its own
right, and can occur anywhere in any Unicode file.  It happens that
the semantics are perfect for a signature (including BOM), and that
the constituent byte sequences in all UTFs are quite rare in any
natural text in other encodings.

Pedantic?  Of course!  I think it's worth being pedantic about Unicode
at this stage; the risk of backward-incompatible changes in repository
format due to mistaken implementation of Unicode is too high to be
ignored.
_______________________________________________
darcs-devel mailing list
darcs-devel at darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-devel


More information about the darcs-devel mailing list