|
The encoding problem has always been there. Be it webpages, text files... - the same problem. Only difference is your text editor or browser doesn't break if it encounters incorrect or broken encodings, it just displays a funny character.
Remember industry-strength data exchange in pre-XML times? Discussing a format, spending two weeks for either side implementing it, and after one month the system goes *bang* because of some data set out of spec? Was this better?
Now we say, woah, there's XML - send me your XML, maybe your DTD (but who looks at them anyway...), I'll write a style sheet for my parser in an hour. Finished. This all works, even in real world, but when you feed it broken things it will make boom as it did with CSV and data-padded-with-blanks-to-field-length text files. So don't blame it all on RFC 3023 (although i think this is an exceptionally stupid one) or XML in general - it's all just the emperor in new cloths.
The real problem is that there exist programmers who, by mercy of chance, grow up in countries where they can happily get around with ASCII and never run into any problems. So they produce code that assumes ASCII, they produce libraries with this code, and these libraries become time-bombs.
XML with it being-strict about encoding triggers a lot of these time-bombs. You really want to blame XML for this? Really?
What we must do is educate programmers that there is more than ASCII. That a character can have more than one byte. That the number of bytes a character is formed of can vary. That along with the data we want to distribute we must also distribute the encoding of that data, and that this encoding must be CORRECT and MATCH the data EXACTLY and not by 99%!
|