[darcs-users] Default binary masks
Sean E. Russell
ser at germane-software.com
Tue Nov 25 03:43:52 UTC 2003
-----BEGIN PGP SIGNED MESSAGE-----
On Monday 24 November 2003 17:49, Trevor Talbot wrote:
> UTF-16 is the most balanced of the Unicode formats to deal with.
Why "most balanced"? It seems to combine the worst of all worlds:
* Incompatability with 7-bit ASCII
* Two byte orderings -- UTF-16 and UNILE; haven't we struggled enough with
byte ordering problems?
* Overlapping start characters -- given a byte the middle of a file, you have
to know two pieces of information to start scanning characters: the byte
count of the byte (so you know if it is high or low) and the byte ordering
(which is all the way at the start of the byte stream).
* You get none of the advantages of a fixed-width encoding, yet everybody
seems to think it is a fixed width encoding.
Most of this equates to an inferiority to UTF-8 in terms of interoperability,
which is why UTF-8 is more common than UTF-16 in transport protocols and data
storage, and why XML defaults to UTF-8 encoding.
> It's also NT's native format. Most non-trivial Unicode APIs on other
> platforms seem to use UTF-16 as well.
I admit to a western bias, for which UTF-8 is much better suited (being
compatible with 7-bit ASCII). In favor of UTF-16, Java uses UTF-16, as does
ECMAScript (although, if that's not a good reason to *not* choose UTF-16, I
don't know what is), and UTF-16 is recommended by the Unicode Consortium.
Despite this, UTF-8 is more widely used than UTF-16; Japanese do prefer
UTF-16 to UTF-8, but they tend to prefer EUC and Shift-JIS even more, so no
I think this is starting to stray off-topic, though.
### http://www.germane-software.com/~ser jabber.com:ser ICQ:83578737
### GPG: http://www.germane-software.com/~ser/Security/ser_public.gpg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
-----END PGP SIGNATURE-----
More information about the darcs-users