How Do I Hate Thee?
November 3, 2004
For a group of people who spend so much time working with and talking about XML, it's not surprising that the members of the XML-DEV mailing list know exactly what it is that they dislike about XML. Over the last week, we've had a festival of complaint about and hate of XML's misfeatures. But amongst the bile, there's also a very interesting debate: what exactly is it about namespaces that makes people mad?
Let Me Count the Ways
Two of XML-DEV's stalwarts, Mike Champion and Len Bullard, started the trouble. Champion posted to the list observing that five years on from the XML simplification effort in 1999, it might be time to revisit the topic.
So, five years later ... is it NOW time to think seriously about cleaning up the core XML specs to address the challenges that real-world non-XML geeks have with them (hopefully without throwing out the interoperability baby with the bathwater), is it time to redouble efforts to educate non-XMLgeeks on why they should eat their XML 1.0 veggies and stop whining, will better tools and best practice guidelines solve the problems, or what?
As usual, Champion's mail sparked a lengthy series of responses, not all of which can be covered in this article. What I will follow is the entertaining thread about XML's faults provoked by Len Bullard. In fitting with the season, Bullard somewhat impishly suggested everybody list their top five problems with XML. Everybody loves lists, and the results should turn out to be interesting. Bullard forecast that the real problem would be XML namespaces.
Of the cases presented, isn't the really gnarly one namespaces? In other words, if the edges of that were tidied, how much pain would go away?
Robin Berjon was the first to oblige, picking on DTDs as his bête noir.
- other legacy cruft
- more legacy cruft you always forget is there
Bill de hÓra again mentioned DTDs, but had a broader mix in his top five:
- Default namespaces
- No Clark notation in XPath (or XML)--see 1 for details.
While Robin Berjon agreed that the namespace notation was a pain in XPath, he didn't want to see
"Clark notation" (where the namespace URI is written in full) but instead use of
xmlns() XPointer scheme.
Eric Hanson picked up on one of my own wishlist items, XML packaging, though I'm not sure I'd characterize it as a major flaw:
There is no way to look up, discover and retrieve the library of resources that support with a namespace-qualified element. If you come across a piece of data, there may be hundreds of supporting resources like XSL transformations, schemas, xforms, text documentation, etc. We need a way to link the resources to the data.
Sean McGrath has had more opportunity to polemicize over the faults of XML than most, and unsurprisingly, his list of faults has a broader outlook. Now, if you asked XML-DEV for a list of five things and were actually expecting to get replies with five unique items, you'd be crazy. So McGrath gave six and a neoligism or two to boot.
- The lack of sane, simple roundtrippability. I read in some XML, I write it straight back out again. I [lose] stuff on the way ...
- Namespaces--specifically defaulting and the "declare 'em anywhere you like buddy" aspects.
- No sane, simple pull based XPath 1.0 subset.
- W3C XML Schema--pretty much everything about it.
- Doctype. We should have left assertions about schema compliance (and consequently the entire idea of an embedded document type declaration subset) on the clipping room floor...
- Fuzziness over the use of terms like "XML parser" and "XML Editor" and "XML aware" and "XML compliant" ... Interop problems are the inevitable result.
Rick Jelliffe also picks up on the conformance issue and the general pain around DTDs. Jelliffe comes from a document-oriented XML processing background, so his list of XML faults brings a different perspective to the debate.
- Needs adjusted conformance levels: no-DTD, or DTD+validating ...
- Needs to reserve ISO standard entity names with ISO meanings, so that no-DTD processors can be used in the publishing industry ...
- Need to have namespace-aware DTDs. Even just to allow that @xmlns and @xmlns:* do not need to be declared in the DTD would be a giant step forward ...
xml:space="strip"for use with no DTD.
- W3C needs to endorse ISO Schematron ...
- Whingers who dissipate real opportunities for change ... I certainly think it is time for XML 2.0, but to remove specific problems with the existing syntax, not to reduce the infoset or adopt some different syntax or disenfranchise publishing people further.
Jelliffe's list will certainly strike a fellow feeling with anyone who's ever written more than handful of documents in XML. His point of view is a welcome reminder of XML's role in the publishing world and the seeming blindness of the W3C working groups to its applications there.
So far, the sword of simplicity dangles dangerously over both namespaces and DTDs. But what else is on the chopping block? A recent weblog post from Derek Denny-Brown, a Microsoft developer working on XML products, attempted to document where "XML goes astray." In a fascinating post, Denny-Brown explains the difficulties of XML, designed as a document format, applied in data scenarios.
- XML's treatment of whitespace confuses developers.
- The limitation in the range of allowed characters in XML is a hassle which the Microsoft XML team sees customers complain about on a weekly basis.
- Namespaces are close to a disaster [but not quite, that dubious honor goes to W3C XML Schema]
Elliotte Rusty Harold however was unequivocal in his disagreement with Denny-Brown.
This article is absolute crap, and a typical example of Microsoft think. It blames XML for the very problems Microsoft created and which don't exist in other tools and on other platforms.
He goes on, in a similar vein, to assert that many of the problems Microsoft's customers face are due to a misunderstanding of XML as implemented in Microsoft's APIs, not problems with XML itself. Read the full post if you wish to steep yourself in vitriol. One thing that Harold picked up on that is worth mentioning is the second point as summarized by Obasanjo, the restriction on character ranges in XML 1.0, which would seem to be solved by XML 1.1. That is, assuming it wasn't a confusion between characters permitted in XML text content and XML names.
As the W3C's Liam Quin noted, we'll no doubt expect rapid deployment of XML 1.1 from Microsoft.
Now, onto the most oft-cited XML fault: namespaces.
What Exactly Is the Problem with Namespaces?
Adding his bugbears to the list, Robert Koberg mentioned that he doesn't see what the problem is with namespaces. Peter Hunsberger clearly doesn't agree, citing namespaces five times over as his favorite problem with XML. So, what exactly is the issue?
Joe English writes that his complaint is the hassle of carrying around a namespace URI and a local name:
When I complain about namespaces, it's just the opposite: I don't want to have to use URI/localname pairs everywhere. I'd rather treat element type names and attribute names as simple, atomic strings. This is possible with a sane API, but most XML APIs aren't sane.
Robin Berjon highlighted another common problem, the expectation generated by the use of URIs as namespace names.
People [think] the URIs resolve to something magical (which they should, but usually don't). Then they think that they inherit to descendant elements or to attributes. This is usually dealt with by repeating ten times over that namespaces are dumb.
Michael Kay explained that having something as fundamental as naming as an added extra to XML 1.0 was a bad idea.
Naming is architecturally fundamental: changing the naming architecture of XML by means of a bolt-on to the core standard was an incorrect layering that was bound to lead to many practical problems.
Kay continues, identifying more issues:
It has always been ambiguous whether prefixes are significant or not.
The indirection between prefixes and URIs makes the interpretation of many textual fragments (XML entities, XPath expressions, XQueries, even schema documents) context-dependent.
The use of URIs as namespace names has always been fuzzy around the edges, as exemplified by the "relative URI" debacle.
If namespaces are so bad, can we live without them? Gavin Thomas Nicol thinks so, and said so in his list of XML bugbears:
- Namespaces (who *needs* them?)
- DTD's (should be broken out of the core)
- External Entities (not really necessary)
"But what about XSLT?" asked Robin Berjon. Nicol expands:
XSLT does not need namespaces as such, and could have got along fine with just alpha-renaming (i.e. like elisp packages), and even that wasn't strictly necessary.
History shows this isn't the first time Nicol has suggested this idea. Alpha-renaming is the process of rewriting names to scope them locally, but it's not entirely clear what Nicol is proposing. Any explanations will be gratefully received.
Michael Kay presented a more tangible solution, echoing Bill de hÓra's earlier wish for a "Clark notation" for namespaces.
I have advocated one change which I believe would alleviate the problems: there should be a lexical representation of expanded names that uses the URI and local name, rather than prefix and local name, and this representation should be permitted in any context where a lexical QName is permitted, including in element names and attribute names in source XML, in QName-valued attributes, and in path expressions. This would mean that any XML fragment, XPath expression, etc, could be "namespace-normalized" to make it context free.
Births, Deaths, and Marriages
The latest announcements from the XML-DEV mailing list.
- JAXP 1.3 RI is public on java.net
JAXP 1.3 Reference Implementation showcases a variety of new features in the Java API for XML processing, including DOM L3 Core, DOM L3 LS, SAX 2, XML 1.1, XInclude, and a new Schema independent validation framework
- freebXML CC
freebXML CC is "a set of tools developed to facilitate the work of domain experts managing data dictionaries," designed to work with ebXML Core Components and interoperate with ebXML implementations.
Also in XML-Deviant
- W3C updated XQuery/XPath working drafts
The new working drafts include a number of changes made in response to comments received during the Last Call period that ended on Feb. 15, 2004.
In return, can I get my name spelt correctly? ... can I ... please? ... XML pedants in a swing state ... a firestorm of 297 messages to XML-DEV last week, Len rating 8% (firestarter!) ... more enforcement of schema philosophy in XML editors ... and more talk of namespaces not to everybody's taste ... the LMNL meme will never die.