Unicode

If all you're ever going to do is read and write ASCII documents, you can safely ignore all this material about Unicode and 10646. On the other hand, if you're in the other 98% of the computing profession, then:

You need to know about this stuff!

SGML had an even more general notion of what a character is than XML does; so general in fact, that it was hard to be interoperable. XML takes a very simple view: characters are just numbers, and the numbers mean what the Unicode (and thankfully-identical ISO 10646) standard says they mean.

If you don't know about this stuff but think you ought to, relax, there's good news; all you need to do is get the Unicode spec (available at good technical bookstores or from the Unicode Web site). OK, so it's a little expensive; go ahead and get it anyhow.

Once you've got it, you may blanch at its huge size and weight and wonder what you've got yourself into. Relax, 80% of it is tables of Chinese, Japanese, and Korean characters, which you can safely ignore (unless you're a user of one of those languages who is working with electronic texts, in which case you'll be glad of having them). The first 60 pages or so contain everything you really need to know to do international character processing, presented sanely, logically, and with really great typography.

If you're a bibliophile, you'll enjoy reading this. If your friends and loved ones are too, then it's a coffee table book.

Back-link to spec

Copyright © 1998, Tim Bray. All rights reserved.