The Trouble With ASCII

As the spec says, pure ASCII files are UTF-8 as they sit, and thus don't require an encoding declaration. The problem is that a lot of ASCII files are not quite pure. Modern Microsoft operating systems, in particular, make it easy to type in words like "Español", with the "ñ" encoded according to some "code page" that may or may not line up with ISO 8859-1 or any other standard.

In the document you are now reading, the "ñ" is encoded as #f1, which happens to be correct per 8859; but is definitely not legal UTF-8. Thus this document, to be well-formed, really needs to have an XML declaration like this:

<?xml version='1.0' encoding='ISO-8859-1'?>

Back-link to spec

Copyright © 1998, Tim Bray. All rights reserved.