Sign In/My Account | View Cart  
advertisement

Article:
 XML on the Web Has Failed
Subject: nobody cares
Date: 2004-07-21 23:46:14
From: Julian Bond

In the English language world, nobody cares. They just put up with the bad characters when the original author cut and pasted from Word and MS helfully changed all the quotes.


In the rest of the world, this is a real problem. And especially for the million or so RSS, Atom and FOAF feeds coming out of Japan. And the increasing number of feeds coming out of China.


If you only ever read XML or are interested in XML in your own language then you can assume your own default charset and 95% of the time it will work. You'll still get some individual character exceptions as with usascii, but hey, brains are good at ignoring bad chars and still extracting meaning.


And there'll be places (Japan) where the dominant char set doesn't have 95% share.


UTF-8 and UTF-16 was supposed to solve all this once and for all and we should have just put it behind us by now. Shame then that it's the English speaking software industry that can't manage to support it properly. Just one example. MySQL UTF-8 support is still in beta and missing from most, if not all, deployed installations.


Mark also points at the unholy mess that is MIME types. There's XML out there that is served with every possible MIME type you could think of. There's even XML-RDF commonly served with a MIME type that Firefox/Mozilla fails to display. Basing the char encoding of the internals of an XML doc on the external transport settings was a recipe for disaster. So the disaster happened. I really don't understand why this was necessary. It's the same doc whether it was delivered by http, ftp, smtp, sneakernet or whatever.


Arrrgh!


Previous Message Previous Message   Next Message Next Message


Sponsored By: