[darcs-devel] DARCS for Windows international development
Paul Schauble
Paul.Schauble at ticketmaster.com
Thu May 31 15:18:12 PDT 2007
Well, as long as we're being pedantic...
It's a Byte Order Mark when it is the first two bytes in the file. It's
a ZERO WIDTH NO-BREAK SPACE in any other position.
In any case, in the Windows environment files without a BOM are rare.
How do Linux system format Unicode files?
++PLS
-----Original Message-----
From: darcs-devel-bounces at darcs.net
[mailto:darcs-devel-bounces at darcs.net] On Behalf Of Stephen J. Turnbull
Sent: Wednesday, May 30, 2007 6:17 PM
To: darcs-devel at darcs.net
Subject: RE: [darcs-devel] DARCS for Windows international development
Paul Schauble writes:
> That last is exactly correct. The Byte Order Mark also identifies the
> file as Unicode. It provides a unique signature for UTF-8, UTF-16LE,
and
> UTF-16BE files.
If people would call the character by its name -- "ZERO WIDTH NO-BREAK
SPACE" -- there would be less confusion. It's a character in its own
right, and can occur anywhere in any Unicode file. It happens that
the semantics are perfect for a signature (including BOM), and that
the constituent byte sequences in all UTFs are quite rare in any
natural text in other encodings.
Pedantic? Of course! I think it's worth being pedantic about Unicode
at this stage; the risk of backward-incompatible changes in repository
format due to mistaken implementation of Unicode is too high to be
ignored.
_______________________________________________
darcs-devel mailing list
darcs-devel at darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-devel
More information about the darcs-devel
mailing list