XML.com: XML From the Inside Out

XML.comWebServices.XML.comO'Reilly Networkoreilly.com
  Articles | Weblogs | Newsletter | Safari Bookshelf
advertisement

Article:
 Which XML Technologies Are Beautiful?
Subject: Examples
Date: 2007-08-14 04:57:15
From: halukasaurus

Could you explain the "getting unicode right" comment? In particular, do you mean that it is good because it uses unicode or that it is good to only support a strict subset of unicode (e.g. you cannot put an escape character in text ). I find the fact that it is not full unicode rather annoying.



Previous Message Previous Message   Next Message No Next Message


Titles Only Titles Only Newest First
  • Examples
    2007-08-14 18:15:38 mikeday [Reply]

    XML is defined in terms of UNICODE, which is a good thing, especially in contrast to SGML which predates the introduction of UNICODE and hence has to worry about different character sets, not just encodings.


    Also, XML files "know their encoding", which is specified using the UNICODE byte order mark or explicitly in the encoding attribute of the XML declaration. This avoids all the problems that plague text formats with no explicitly declared encoding, which can easily be misinterpreted when read on different platforms.


    I think that the restrictions on characters that can occur in an XML document are reasonable considering that XML is a textual format, and that higher level markup should use elements and not legacy control codes. In fact I would go further and disallow the use of the C1 control codes, which often indicate an incorrect encoding declaration (eg. Windows-1252 instead of Latin1). XML 1.1 took this step, but was bogged down with other problems and does not look likely to achieve widespread deployment.




Sponsored By:


Contact Us | Our Mission | Privacy Policy | Advertise With Us | | Submissions Guidelines
Copyright © 2008 O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938