The Value of Names in Attributes
February 6, 2002
The Future of History
When the Gibbon or Thucydides of computer scientists sits down in 20 or 30 years to write the history of markup technologies, an obvious periodization will fall out of the data, like the fall and rise of empires or the ebb and flow of ancient wars. She will divide the world into the Era of SGML and the Era of XML, neatly severing, as every historian is wont to do, the technical, social, and historical continuities which many of us call our professional lives.
If she is very clever, she may recognize, as she peers through reconstructed mailing list and other archives, accessible only by using a "personal computer" on exhibit at the Smithsonian, further, more subtle, but no less substantial historical periods. And so the very clever historian will further divide the Era of XML into Before Namespaces and After Namespaces.
All of which is another way of saying that, despite the self-flagellations which inevitably accompany them, the XML development community's repeated attempts to reckon all the implications of namespaces are as unavoidable as they are unfinished.
Namespaces in Attribute Values
The latest scuffle in the long namespace saga concerns qualified names in attribute values, the use of which has prompted some XML programmers, outside the confines of the XML-DEV mailing list, to question the value of attributes at all. As for the XML-DEV list, this time around Evan Lenz raised the initial alarm.
I believe that XSLT 1.0 was the original culprit, but I could be wrong. Now XML Schemas, Canonical XML, and other specs rely on the prefixes and scope of namespace declarations as significant information to be passed to the application...This has introduced an amazing amount of complexity...XML Namespaces are actually getting a worse rap than they deserve, thanks to this increasingly common and fully-W3C-sanctioned practice.
Is it too late to fix this? W3C specs would have to change. My question is: who else uses QNames in attribute values? Is there no turning back?
But there have been suggestions before that qualified names in attribute values were problematic. "As far as I can tell," Simon St. Laurent said, in March of 2000, "this usage isn't sanctioned by the Namespaces in XML Recommendation...I'm not sure I can recommend [it] as best practice." Presaging some of the present discussion, St. Laurent said that one reason he couldn't call it a best practice was that "it isn't clear that applications - which should be processing these attribute values - will ever get the prefix information, given some strongly held beliefs that prefixes are throwaway info for the parser only." "I'm quite aware that both of these specs," he concluded, "are probably too far gone for this critique to have an impact on their development, but I think this issue merits further examination..."
Eric van der Vlist agreed with Lenz that allowing qualified names in attribute values is a mistake, offering two reasons, one practical and one abstract. "...I find it a very bad design practice," since, he said, "XML could have had a nice layered design which is violated by the usage of QNames in attributes (or elements) values." But, further, "there is another major issue...some applications (such as XPath) do not support default namespaces while others (such as W3C XML Schema QName datatype) do!"
Suggesting that qualified names in attribute values neither introduces as much complexity as Lenz implied nor negatively affect the already battered reputation of namespaces among their detractors, Michael Brennan said,
I think it's too late to turn back on this. Too many people are finding this too useful, and too many specs have incorporated this in a fundamental fashion. Most of the uses I've seen are simply for namespace-scoped enumeration values, and I think that's useful...It strikes me as a very elegant way to provide such a mechanism that plays well in the XML world.
Why must namespaces be restricted to only scoping element and attribute names? That seems to me to be unnecessarily restrictive. Other artifacts need name-scoping mechanisms, as well, and I don't see why we should force them to find another means of accomplishing this.
While this usage of namespaces may well clash with some people's expectations about the principle of least surprise, Brennan suggests that a principle of conceptual parsimony may also be at work. That is, if other bits of an XML instance -- in this case, an attribute's value -- need to be "name-scoped", why not use the mechanism by which element and attribute names are scoped? Brennan did not address directly the question of whether every bit of an XML instance needs to be qualified or whether it made sense to qualify attribute values at all.
Lenz subsequently offered a direct response to Brennan's claims about parsimony:
...there's no way for an XML processor to tell whether QNames are used in values. Consequently, all scope information, i.e. exactly where every xmlns declaration is and what prefix it uses, must always be passed to the application, regardless of whether it's needed or not...This practice blurs the distinction between the XML processor and the XML application, forcing the processor to pass a bunch of redundant information to the application...If each layer had its own namespace declaration mechanism (one for element/attribute names, and one for application-specific content), then it would always be possible to throw away scope information...The point of a namespace declaration is to enable the resolution of QNames. Currently, it is forced to do a lot more than that.
Also in XML-Deviant
Which provides, if nothing else, an illumination of van der Vlist's claim about the felicity of a layered design. Lenz rejects the idea that one and only one mechanism (as opposed to there being more than one syntactical construct, of course) should be used to qualify names, suggesting, instead, that there should be a mechanism for qualifying element and attribute names and another for qualifying application data. As Lenz implied, the issue has more to do with long-term design coherence than what people are actually doing today. "I could easily be persuaded," he said, "that we have to stick with the status quo, that it will cause more damage to remove the knife than to leave it in. But I don't anticipate being convinced that inflicting the wound was the right thing to do in the first place."
In addition to Brennan's abstract defense, others suggested that the fairly widespread use of qualified names in attribute values, particularly in W3C specifications, meant that in some sense it was too late to change, even if it were a bad idea. Jonathan Borden, sounding more resigned than resolved, made this point, one that's analogous to the free software community's "running code trumps an unimplemented, superior design" practice:
...the fact is that usage of QNames in attribute values is a fact of life and has been since XSLT 1.0. I _have_ dealt with these issues when writing XSLT transforms, and they can be a pain, yet the issues also can be dealt with. I suppose the takehome message is that you cannot blindly rename prefixes, even though this is against the intentions of XML namespaces, unless you can be sure that the content of attributes will not be affected. Oh well.
The third justification for not undoing qualified names in attribute values is their relative utility. Using a qualified name is simply easier, Ronald Bourret suggested, than the alternative, using a URI and a local name. In XML-DBMS, Bourret said, "...when we refer to an element type name, we use a QName. We could have used two separate values (URI and local name), but if you've ever typed this in more than once, you'll quickly realize that QNames are much friendlier..."
Resolving the Issue
There are, of course, many questions surrounding this issue remaining: how big a problem are we talking about, and for how many people? Is it primarily an issue for tool makers or are XML application developers going to have to deal with the issue and the overhead, too? And, further, if the existing patterns of usage are sufficient to prevent any radical changes, is this something that should be clarified for XML 2.0? Can the implementation conventions and assumptions be clarified or more widely publicized? For example, should default namespaces apply to qualified names in attribute values or should they always be stated explicitly?
At the very least, that Lenz asks in 2002 whether it's too late to change what seemed to St. Laurent in early 2000 already too far gone is an indication that the issue, as hard as this is to accept, may not have been discussed widely enough. That the developer community disagrees about the utility, the design, the impact, and the implementation of qualified names in attribute values is a very good indication that it is a widely, essentially contested issue. I suspect that it, like many such issues, will inevitably be presented to the W3C's Technical Architecture Group, the TAG, for adjudication; which, as we will discover, does not mean that the TAG will actually decide the issue. The TAG bottleneck -- it's accepted nine issues already -- will become apparent all too quickly.