Although the article provides useful information,
it is misleading. It makes people believe that
RFC 3023 hampers interoperability of XML, but
this is not true.
It is not RFC 3023 but MIME (RFC 2046) and
HTTP (RFC 2616) that specifies the charset
parameter of text/* as the only authoritative
information. RFC 3023 introduces nothing new
about text/xml.
Then, should MIME and HTTP blamed? I do not
think so. We first have to understand why they
use the charset param for every protocol and
every text/* media type. Do we continue to
introduce an ad-hoc encoding declaration mechanism
whenever a new format (e.g., XQuery and RNC)
is invented?
Apart from text/xml, RFC 3023 provides
application/xml, which is often more appropriate
than text/xml. One of the reasons is that the
omission of the charset means "Use the encoding
declaration or BOM". A new I-D for XML media types
makes this point clearer and even deprecates text/xml.
http://www.ietf.org/internet-drafts/draft-murata-kohn-lilley-xml-00.txt
One weak point of the article: it does not show
how many "text/xml and non-ASCII" documents provide
correct encoding declarations. If quite a few
documents fail to specify the encoding dcl correctly,
the problems boils down to "I18N is hard".
MURATA Makoto (co-author of RFC 3023)
|