This week XML Deviant reports on a Namespace-related debate holding up XML work at the W3C, and the final release of SAX2/Java.
Simon St. Laurent, pointing out that the latest DOM Level 2 specification gives no end date for its Candidate Recommendation phase, provided the first inkling that there may be problems at the W3C. Eagle-eyed St. Laurent was referring to comments added to the status of the specification:
This specification is still in the Candidate Recommendation phase. A coordination issue has arisen, which necessitates an extended Candidate Recommendation phase. It will end when the coordination issue is resolved.
The lack of an end date is, strictly speaking, in violation of the published W3C process. However Ian Jacobs was able to clarify that the process is undergoing a 6-monthly review:
In this case, our internal process is changing, but the Process Document has not been reissued yet to reflect that evolution. In fact, I'm revising the document as we speak.
Whilst explaining the break from process, this didn't provide any details concerning the coordination issue. Lauren Wood drew attention to a brief explanatory note included in the status:
The coordination issue affects the handling of namespace URIs. The resolution of the coordination issue may necessitate changes to the DOM Level 2 Core module.
Michael Champion attempted to throw some additional light onto the problem, without breaking W3C confidentiality:
There is a coordination issue, no one knows how long it will take to resolve that issue, and the DOM Level 2 will proceed to [Proposed Recommendation] or else fix the problems raised by the "coordination issue" when it IS resolved. The DOM WG carefully considered the wording quoted above to convey the message that the current DOM Level 2 draft is still a "Candidate Recommendation" which may or may not change before it becomes a "Proposed Recommendation", and we don't want to raise expectations about the outcome or timing of events beyond our control.
Champion also indicated that the issue not only affects the DOM, "but other W3C specs as well." An intriguing mystery....
Diagnosing the Problem
However, the mystery wasn't to last long. In an anonymous posting to XML-DEV, someone calling themselves "Pope 32767" leaked the details of the "coordination issue." The author colorfully described the problem as a "mind virus infecting the W3C XML Activity," justifying the leak as a means to find a quick solution to the problem:
I don't think it should all be kept undercover, instead outside groups like XML-Dev need to use some pressure where ever they can (over a beer or whatever) to get the mess cleaned up and the Working Drafts moved on.
So what exactly is the problem? Readers unfamiliar with Namespaces may wish to first take a look at Tim Bray's article "Namespaces by Example," and a short article by Jon Bosak (posted to XML-DEV last year), for some useful introductory material. The Namespace Recommendation (which probably merits several readings!) is also available online.
The anonymous contributor summarized the problem, which relates specifically to Namespaces that are not absolute URIs:
The Namespaces Recommendation said that two namespaces were the same if they matched exactly char-by-char. (The attribute values not the prefixes, that is). It also said that they were URI references. Those two ideas conflict because the same-looking relative URL means different things depending on what document it's in, but strings are strings no matter what the context is.
That leads to 3 ideas: forbid relative URLs in namespace declarations, always convert relative to absolute before comparing, or just say that namespace names aren't URLs at all but just strings that look like them (keep the exact-match idea). All these ideas either break existing documents or existing software or both and they are incompatible with each other.
He continued by saying that the members of the W3C's XML Activity are divided as to which solution should be selected.
So why might you use a relative URI (e.g.,
/pub/xmldeviant) over an absolute URI (e.g.,
http://www.xml.com/pub/xmldeviant)? The issue actually boils down to what
you want to do with the Namespace identifier. The Namespace specification states:
The namespace name, to serve its intended purpose, should have the characteristics of uniqueness and persistence. It is not a goal that it be directly usable for retrieval of a schema (if any exists).
In other words, namespaces don't have to point to anything. But that is not specifically ruled out either, so you are free to provide a schema or other resource (e.g., a Java class) at the target URI. This may lead to usage of relative URIs (which are valid under the Namespace specification) as it makes resource management that much easier. You may instead retrieve the schema from the local file system (where it may have been supplied with the document), rather than the Internet, depending on the base URI currently defined.
Rick Jelliffe summarized the alternate viewpoints, indicating that the Namespace Recommendation is the root cause of the problem:
It is clear that there are two very strong camps: one wants to be able to track down a schema from the NS identifier; the other wants the NS clear to be equally useful for any processing. At the extreme, these views are "namespace = schema" and "namespace = string".
Ignoring the extremes as bogeymen, the two camp's views are not in fact irreconcilable. They are the inevitable result of the unsatisfactory state of the XML NS REC.
This isn't the first time the Namespace Recommendation has received adverse comment. In fact, it's probably the single most hotly debated W3C specification: Namespace debates have become part of XML-DEV folklore. This is the first time that differing views within the W3C XML Activity have become so public, however.
Looking for a Cure
Tim Berners-Lee, the Director of the W3C, chose to meet the debate head-on, setting up a mailing list to solicit discussion of the issue within a public forum. In his introductory message to the list, Berners-Lee described his goal:
...to persuade people of the importance of URIs in namespaces. I am using this mailing list, instead of just writing a note, because it seems the differences in understanding are of the sort which need a back and forth to resolve. I have written notes about this before - in http://www.w3.org/DesignIssues/ for example. I have lectured on it and set it forth as the basic underlying architectural tenet upon which the web past present and future is built. [S]o when a set of intelligent and thoughtful people produce specifications which do not reference the URI spec for their global identifiers, then clearly I need to understand their thoughts.
In response to the anonymous leak, Berners-Lee acknowledged that the summary of the basic issues was correct, but clarified his role as that of a "judge of consensus." The current lack of consensus meant that he wished to openly debate the problem and avoid enforcing an arbitrary decision.
Berners-Lee stated his belief that Namespace identifiers are extremely important, and that an alternate string-comparison-based approach was technically broken:
When it comes to XML namespaces, the namespace identifier is very special. It is (or they are) the key to the whole document. The namespace identifies the terms which are used, and upon the meaning of those terms rests the meaning of the document...It is really important for electronic commerce and, indeed, the whole future of the web, that XML documents be defined in terms of namespaces which are considered to identify the meaning of the language they are written in.
Berners-Lee's vision for a Semantic Web of interconnected, machine-readable resources is founded on the need to attach names and ultimately semantics to these resources. Not surprisingly, Berners-Lee is keen to see the URI mechanism used to facilitate this, and is against reducing Namespace identifiers to simple strings. If this happens, then how do the semantics get attached? Even with a URI-based scheme, however, there is no specification that defines how a schema (or other resource) should be attached to a namespace. Another gap in the XML framework?
Returning to the immediate problem of relative URIs, Simon St. Laurent observed that if the base URI is defined by location, then there are important problems:
More frightening for me at least, is the prospect that moving a document from one location on a site to another might in fact change the understood vocabulary of the document. That seems too risky - to me - to allow.
St. Laurent also commented that if XML Base is used to define the base URI for a Namespace, then there are backwards compatibility issues:
If XBase affects namespace URIs, then applications which understand XBase will see different namespaces than those that don't. We already have parsers and applications that understand XML 1.0 but not namespaces; we may end up with parsers and applications that understand XML 1.0 and xml:base but not namespaces, as well as parsers and applications that understand XML 1.0 and namespaces but not xml:base. At some point, unless everyone is upgrading consistently, XML ceases to be remotely interoperable across different implementations.
The XML Base Working Draft does note its impact on the Namespace Recommendation. Compatibility issues may therefore require an update to the Namespace Recommendation. Tim Berners-Lee noted that he would prefer to see relative URIs remain legal:
I admit I could live with a solution in which relative URIs were banned, but I would much prefer one which pointed out that you only use them when that is what you mean, and that some older XML software might occasionally get confused in doing validity checking of your document.
This discussion does lend additional weight to rationalization of the XML framework proposed recently by Rick Jelliffe. (See "Filling In The Gaps" for more details.) Whatever the ultimate outcome (and debate has already started on the XML-URI mailing list), it's important that a decision be made relatively quickly.
It's always good to finish on a high note, so the Deviant is pleased to point out that SAX2/Java has now been finalized. In his announcement, David Megginson noted that SAX2 adds the following additional features:
SAX2 extends SAX1 by adding support for Namespaces, for filter chains, and for querying and setting features and properties.
Megginson thanked the community for their efforts, and confirmed that he would be working on SAX2/C++ before handing over the reins:
There is a great need for a single SAX API for C++, and I plan to help out with that. When SAX2/C++ is released, it is my intention to find a new person or organization to take over the maintenance of SAX.
David should be congratulated on his efforts. Thanks David!