XML.com: XML From the Inside Out

XML.comWebServices.XML.comO'Reilly Networkoreilly.com
  Articles | Weblogs | Newsletter | Safari Bookshelf
advertisement

Article:
 Which XML Technologies Are Beautiful?
Subject: Examples
Date: 2007-08-14 18:15:38
From: mikeday
Response to: Examples

XML is defined in terms of UNICODE, which is a good thing, especially in contrast to SGML which predates the introduction of UNICODE and hence has to worry about different character sets, not just encodings.


Also, XML files "know their encoding", which is specified using the UNICODE byte order mark or explicitly in the encoding attribute of the XML declaration. This avoids all the problems that plague text formats with no explicitly declared encoding, which can easily be misinterpreted when read on different platforms.


I think that the restrictions on characters that can occur in an XML document are reasonable considering that XML is a textual format, and that higher level markup should use elements and not legacy control codes. In fact I would go further and disallow the use of the C1 control codes, which often indicate an incorrect encoding declaration (eg. Windows-1252 instead of Latin1). XML 1.1 took this step, but was bogged down with other problems and does not look likely to achieve widespread deployment.



No Previous Message Previous Message Move up to Parent Message Up Next Message No Next Message

Sponsored By:


Contact Us | Our Mission | Privacy Policy | Advertise With Us | | Submissions Guidelines
Copyright © 2008 O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938