Menu

Cracks in the Foundation

November 8, 2006

Micah Dubinko

The last week in October wasn't the smoothest for the W3C HTML Working Group. First, a notable blog entry criticized their handling of XML namespaces, leading to a formal objection. On top of that, Tim Berners-Lee blogged that new and separate HTML and forms Working Groups would be chartered to "incrementally" update HTML, in contrast with the groups' present approach. More on that later. As has always been the case, XML Annoyances aims to stimulate discussion on XML topics by challenging entrenched views. This article digs beneath the surface issues and encourages others to do the same.

The objectionable technical issue relates to what is commonly called a chameleon schema, that is, the ability for elements defined in a vocabulary to appear in more than one namespace -- in this case, for XForms elements to appear without additional qualification in XHTML. True enough, this seems to fly in the face of the goal of globally unique namespace-qualified elements. As one blogger writes:

If you believe there is a single, more important, absolute requirement in the land of XML than that of the proper usage of XML namespaces: You obviously don't understand XML.

Similarly, in a section dedicated to XML namespaces, O'Reilly's XML Hacks by Michael Fitzgerald asserts in hack #59:

Though controversial, XML namespaces are a necessity if you want to manage XML documents in the wild.

Controversial, yes. But a "necessity"? Many such statements are treated as an axiomatic bedrock. Instead of blithely accepting these aphorisms, let's look at some specific evidence.

Introductions to XML Namespaces

Nearly any contemporary XML reference will include material on XML namespaces to help novices get up to speed on a potentially tricky topic. Beginners need to be shown powerful examples of problems that led to the requirement to have namespaces in XML in the first place. Let's examine a few real-world examples from my bookshelf.

The previously mentioned coverage in XML Hacks is interesting in that it doesn't even show a multiple-namespace example.

O'Reilly's excellent XML in a Nutshell, Third Edition, by Elliotte Rusty Harold and W. Scott Means, devotes an entire chapter to namespaces. The opening example uses a mythical catalog of paintings' markup language, which has conflicts on elements named title, date, and description. Even then, the description elements are equivalent in content model and overall purpose, only appearing in different contexts -- one describes a page and one describes a painting.

Another highly regarded work, Ted Leung's Professional XML Development with Apache Tools dives in and addresses namespaces on the third page. The example? An Apache-XML-Book vocabulary, wherein one "can easily imagine the element name, title, or author being used in another XML grammar, say one for music CDs."

The point here is not to criticize the writing of these books. Quite the opposite; I'm relying on the talented writing in these books to make my point. Perhaps there's something funny going on with the problem statement of a technology that relies on unrealistic examples to justify its existence.

Numerous similar examples exist on your bookshelf, too. Post your favorite good or bad examples in the comments section below. For such an indispensable layer of XML processing, compelling real-world examples are hard to find.

Speed Bumps on the Road to Deployment

By the end of the 1990s, the markup world looked bright. The XML Recommendation was still fresh and new, and the Namespaces in XML Recommendation fresher and newer still. Combined, the two would usher in a new era of stricter error-checking, leaving behind the messy habits of HTML authors past. Unfortunately, many of the early implementations failed to properly implement namespaces in a manner conforming to the spec.

Internet Explorer 5, in particular, set an unfortunate precedent. In a 1999 article, Tim Bray writes:

What you have to do is declare a namespace prefix, and that namespace prefix has to be html -- no other string will work! You have to declare it, but you don't have to map it to any namespace name in particular (do a "view source" on this page to see what I mean). This is a huge violation of the essence of the namespace spec, which would suggest that Microsoft somehow Just Doesn't Get It about namespaces, except for we know that they do. Puzzling.

This kind of thinking persisted. Even years later, in 2002, Ben Hammersley wrote about namespace problems revolving around the RSS 2.0 spec, made worse by a flaw in Dave Winer's reference implementation in a software package called Radio. The problem?

This is because, as many other people might not realise either, it forgets that the namespace prefix can change. A proper XML parser takes it only as a reference to the URI. It is the URI that matters.

Even today, the concept of an arbitrary, changeable mapping from short prefixes to URLs is confusing and nonintuitive to many. Further examples are welcome in the comments section.

Con-Fusion

Other specific choices made in the development of XML namespaces cause persistent confusion among markup practitioners. For one, namespace prefixes are largely incompatible with DTDs, which, although unfashionable, are still a built-in part of XML and intimately connected to any use of XML involving DOCTYPEs or named character entities. In other words, they're still important to nearly any web developer.

Does a namespace declaration apply to elements or attributes? A reasonable answer would be "yes and no"; the subtleties still pop up in mailing lists. Ron Bourret covered this and more in the seminal Namespace Myths Exploded article previously published on this site. The way things ended up, attributes can't be placed in a specific namespace without an explicit prefix regardless of whether a conflict is even possible -- a decision that would cause problems later.

Another trouble spot lies in the use of "namespace names," or strings that look like URLs, which add a truckload of URL baggage to the spec and lead many enthusiastic new learners to wonder what will happen if they visit that URL in a browser. This is an old debate, one that won't be rehashed here. But direct confusion from the XML Namespaces spec is only the beginning.

Collateral Damage

QNames in content: what does this phrase bring to mind? The Namespaces in XML spec defined QNames but remained silent on the topic of using them in content; in practice, it arrived with XPath 1.0, which needed a way to refer to element and attribute names. At the time, making XPath identifiers look the same way as the elements themselves was justified as the most sensible way to deal with two-part names with one part bound to a longer third name. Over time, though, this practice has fallen out of favor and been compared to "using TCP packets as delimiters in an application protocol." This bit of unpleasantness has become firmly entrenched in XML vocabularies, at a minimum including any that use XPath. The lineage of this practice can be traced straight back to namespaces.

Aside from using QNames in content, the XML Schema Part 1 specification has been criticized for its complexity. I started counting how many times the word "namespace" or its plural appears in the document and gave up somewhere around 400. How much of this complexity is caused by namespace-think?

Then there's XLink, once a promising branch of XML technology. XLink failed to meet a key requirement:

It must be possible to apply XML link semantics to existing documents by modifying the documents' DTDs only, requiring no modification to the document instances themselves.
For example by supplying appropriate information in an element's definition (in the DTD), such as a default ROLE attribute. This provides for layering of XML link semantics onto large bodies of XML documents without requiring modification of those documents.

The syntactic restrictions introduced by namespaces caused this conflict. It wasn't possible to meet this requirement and define XLink as a distinct namespaced vocabulary.

Even when vocabularies use namespaces, there's no guarantee of coordination. If anything, scoping encourages folks to go off and do their own thing. Already within the W3C we have paragraphs as html:p as well as speech:p, where html and speech map to the "namespace names" for XHTML and Speech Synthesis Markup Language, respectively. (Don't even get me started on wml:p.) Markup vocabularies also have multiple anchors as a elements, including XHTML, SMIL, and others outside of W3C. So the problem attributed to chameleon namespaces at the outset of this article has already come to pass. In aggregate, the amount of time that has been wasted debating and rehashing such issues in standardization and development communities is staggering.

Now What?

There are a few other lines of evidence that will have to go in another article, like the mobile industry reaction to namespaces or examining the continual stream of proposed alternatives. Individually, any of the objections recounted here wouldn't amount to much. Collectively, though, they form a composite sign that suggests that the XML community might need to reconsider not only its approach and attitudes toward HTML, but toward the foundation of namespaces as well.

What would the XML world look like without namespaces, or with a less intrusive version thereof? Docbook and HTML would continue to be fine. Newer languages like SVG would be slightly different in details, but overall the same. The big question is what compound documents would look like, but it's hard to imagine the situation much worse than what we have today.

So far, the W3C hasn't posted an official notice to the effect of what Tim Berners-Lee wrote on his blog. Nevertheless, it's encouraging to see a willingness to change course when needed. Let's hope this willingness extends deep enough to reconsider namespaces. The tension between incremental HTML 4 philosophies and XML namespace practice will only get stronger.

The formal objection noted at the start of this article concludes with these words:

If we're going to go changing the namespace for every host language that comes along, we might as well not have namespaces in the first place.

Actually, that doesn't sound like such a bad alternative.

Disclosure: the author of this article is a former editor for the XForms and HTML Working Groups, and is a contributor to XML Hacks. He submitted this article in namespace-free HTML.