[darcs-devel] DARCS for Windows international development
Stephen J. Turnbull
stephen at xemacs.org
Wed May 30 18:16:42 PDT 2007
Paul Schauble writes:
> That last is exactly correct. The Byte Order Mark also identifies the
> file as Unicode. It provides a unique signature for UTF-8, UTF-16LE, and
> UTF-16BE files.
If people would call the character by its name -- "ZERO WIDTH NO-BREAK
SPACE" -- there would be less confusion. It's a character in its own
right, and can occur anywhere in any Unicode file. It happens that
the semantics are perfect for a signature (including BOM), and that
the constituent byte sequences in all UTFs are quite rare in any
natural text in other encodings.
Pedantic? Of course! I think it's worth being pedantic about Unicode
at this stage; the risk of backward-incompatible changes in repository
format due to mistaken implementation of Unicode is too high to be
ignored.
More information about the darcs-devel
mailing list