XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Cracks in the Foundation

Cracks in the Foundation

November 08, 2006

The last week in October wasn't the smoothest for the W3C HTML Working Group. First, a notable blog entry criticized their handling of XML namespaces, leading to a formal objection. On top of that, Tim Berners-Lee blogged that new and separate HTML and forms Working Groups would be chartered to "incrementally" update HTML, in contrast with the groups' present approach. More on that later. As has always been the case, XML Annoyances aims to stimulate discussion on XML topics by challenging entrenched views. This article digs beneath the surface issues and encourages others to do the same.

The objectionable technical issue relates to what is commonly called a chameleon schema, that is, the ability for elements defined in a vocabulary to appear in more than one namespace -- in this case, for XForms elements to appear without additional qualification in XHTML. True enough, this seems to fly in the face of the goal of globally unique namespace-qualified elements. As one blogger writes:

If you believe there is a single, more important, absolute requirement in the land of XML than that of the proper usage of XML namespaces: You obviously don't understand XML.

Similarly, in a section dedicated to XML namespaces, O'Reilly's XML Hacks by Michael Fitzgerald asserts in hack #59:

Though controversial, XML namespaces are a necessity if you want to manage XML documents in the wild.

Controversial, yes. But a "necessity"? Many such statements are treated as an axiomatic bedrock. Instead of blithely accepting these aphorisms, let's look at some specific evidence.

Introductions to XML Namespaces

Nearly any contemporary XML reference will include material on XML namespaces to help novices get up to speed on a potentially tricky topic. Beginners need to be shown powerful examples of problems that led to the requirement to have namespaces in XML in the first place. Let's examine a few real-world examples from my bookshelf.

The previously mentioned coverage in XML Hacks is interesting in that it doesn't even show a multiple-namespace example.

O'Reilly's excellent XML in a Nutshell, Third Edition, by Elliotte Rusty Harold and W. Scott Means, devotes an entire chapter to namespaces. The opening example uses a mythical catalog of paintings' markup language, which has conflicts on elements named title, date, and description. Even then, the description elements are equivalent in content model and overall purpose, only appearing in different contexts -- one describes a page and one describes a painting.

Another highly regarded work, Ted Leung's Professional XML Development with Apache Tools dives in and addresses namespaces on the third page. The example? An Apache-XML-Book vocabulary, wherein one "can easily imagine the element name, title, or author being used in another XML grammar, say one for music CDs."

The point here is not to criticize the writing of these books. Quite the opposite; I'm relying on the talented writing in these books to make my point. Perhaps there's something funny going on with the problem statement of a technology that relies on unrealistic examples to justify its existence.

Numerous similar examples exist on your bookshelf, too. Post your favorite good or bad examples in the comments section below. For such an indispensable layer of XML processing, compelling real-world examples are hard to find.

Speed Bumps on the Road to Deployment

By the end of the 1990s, the markup world looked bright. The XML Recommendation was still fresh and new, and the Namespaces in XML Recommendation fresher and newer still. Combined, the two would usher in a new era of stricter error-checking, leaving behind the messy habits of HTML authors past. Unfortunately, many of the early implementations failed to properly implement namespaces in a manner conforming to the spec.

Internet Explorer 5, in particular, set an unfortunate precedent. In a 1999 article, Tim Bray writes:

What you have to do is declare a namespace prefix, and that namespace prefix has to be html -- no other string will work! You have to declare it, but you don't have to map it to any namespace name in particular (do a "view source" on this page to see what I mean). This is a huge violation of the essence of the namespace spec, which would suggest that Microsoft somehow Just Doesn't Get It about namespaces, except for we know that they do. Puzzling.

This kind of thinking persisted. Even years later, in 2002, Ben Hammersley wrote about namespace problems revolving around the RSS 2.0 spec, made worse by a flaw in Dave Winer's reference implementation in a software package called Radio. The problem?

This is because, as many other people might not realise either, it forgets that the namespace prefix can change. A proper XML parser takes it only as a reference to the URI. It is the URI that matters.

Even today, the concept of an arbitrary, changeable mapping from short prefixes to URLs is confusing and nonintuitive to many. Further examples are welcome in the comments section.

Pages: 1, 2

Next Pagearrow