The Trouble With ASCII

As the spec says, pure ASCII files are UTF-8 as they sit, and thus don't require an encoding declaration. The problem is that a lot of ASCII files are not quite pure. Modern Microsoft operating systems, in particular, make it easy to type in words like "Espa˝ol", with the "˝" encoded according to some "code page" that may or may not line up with ISO 8859-1 or any other standard.

In the document you are now reading, the "˝" is encoded as #f1, which happens to be correct per 8859; but is definitely not legal UTF-8. Thus this document, to be well-formed, really needs to have an XML declaration like this:

<?xml version='1.0' encoding='ISO-8859-1'?>

Back-link to spec

Copyright © 1998, Tim Bray. All rights reserved.