XML 2.0 -- Can We Get There From Here?

February 20, 2002

It seems inevitable that the W3C will eventually offer a standards document which it calls XML 2.0. The post-XML 1.0 period has seen the development of too many attendant technologies, has too often heard pleas for refactoring from the development community, and too many XML-standards family warts are now widely conceded for XML 1.0 to last indefinitely. The only interesting questions which remain unanswered are what XML 2.0 will look like and how politically nasty the process that creates it will be.

What Will XML 2.0 Look Like?

The W3C's Technical Architecture Group (TAG) was chartered primarily to make the tough, overarching decisions about how all the various parts and pieces of "Web technology" are supposed to fit together. In Mary Shelley's famous Gothic novel, Frankenstein, the narrator creates an eight foot tall creature out of bits and pieces of ordinary humans stolen from graves and charnel houses. While it is too harsh to suggest that W3C Working Groups are like charnel houses, one gets the sense that the work of the TAG is going to be rather more hodge-podge and ad hoc than one might assume from its lofty name. The range and quality and sheer number of the requests for TAG adjudication already suggest that there are a lot of corner cases lurking in dark corners.

Perhaps it should surprise no one, then, that out of the ad hoc, variegated work of TAG comes one of the first substantive proposals or trial runs or "thought experiments" (though that's a misnomer, really) for what XML 2.0 might look like. It is fitting that Tim Bray, who was so instrumental in XML 1.0, would first offer a draft for what XML 2.0 might eventually become: Extensible Markup Language - SW (for Skunkworks). Though, it should be said at the outset, Bray offers XML-SW as a highly provisional proposal; or, as he put it, "nobody so far - not even me - has taken the stand that this is a good idea". But it is a start.

XML-SW is a conglomeration of XML 1.0 2nd edition minus the DTD machinery, including entities, with the addition of namespaces, XML Base and XML infoset. The result, in Bray's view, as well as that of some other notable XML developers, is a net gain of simplicity and elegance. Bray described some of the changes in detail.

All the endless circumlocutions around parameter entities: gone. "For interoperability": gone. The attribute value normalization and line-end handling migrate into the infoset, where they belong. xml:base goes with xml:lang and xml:space into a section about reserved attributes. Namespaces go into the discussion of elements and attributes, where they belong. "standalone=": gone. There's a nice "other markup" section for comments, PIs, and a vestigial doctype declaration. The vestigial doctype is defined purely syntactically and has no internal subset - a low-cost way to let people do DTD validation with XML 1.0 processors. The conformance section has real content, including the error-handling, which has migrated out of its awkward home in the definitions list. All the links out of infoset and namespaces are internal.

As interesting as these changes may be, what's even more interesting, I think, is the degree to which the process that will finally settle these questions, as well as all of the wrangling and squabbling leading up to the start of that process, isn't technical at all. Whatever XML 2.0 eventually becomes technically, the process that creates it will be more social and political than anything else, and it's that process which seems perilous and fragile at best.

Bray is certainly aware of all the non-technical work which goes into making, to say nothing of remaking, an important public standard like XML. He even suggests a kind of in-advance rule or guideline. "The temptation to introduce," Bray says, "JUST A FEW little obvious improvements that nobody could possibly disagree with is overwhelming, but that is a slippery slope leading into the most noisome of ratholes". And so it is both a "noisome rathole" and an "overwhelming temptation".

So overwhelming, in fact, that no one could resist it and still make a substantive proposal for XML 2.0. The whole point of creating a new version of the standard is to improve upon the existing one. And so, as Bray clearly knows, his message announcing XML-SW abjures precisely what it accomplishes; namely, Bray's XML-SW makes many small improvements, for example, dropping the DTD and entity machinery, even as he pleads with others to refrain from suggesting their pet improvements.

Whatever else the value of XML-SW, or any such proposal, Bray's in-advance rule for structuring the XML 2.0 process is doomed to fail, that is, it's certain to be ignored by everyone who is centrally or peripherally involved in the conversation from which the world will get a revised XML specification. If anyone has moral standing to go first in proposing what XML 2.0 might look like, Bray and a few others have it. Someone has to go first. Someone has to make a proposal which will initiate the ideally collaborative and consensual process. To put this point in Shelleyean terms, Frankenstein had to create the monster, but after that it gained a life of its own.

But no one has enough standing to ask others to refrain from suggesting the "little obvious improvements that nobody could possibly disagree with". Sorting out those suggested improvements, saying yes to some and no to others, just is the process, and it will be messy and political, and there isn't anything anyone can do to prevent it.

Frankenstein's Monster's Neck Bolts

Also in XML-Deviant

The More Things Change

As if to illustrate this general point, most of the subsequent discussion of Bray's XML-SW draft focused on whether XML 2.0 should drop processing instructions (PIs). PIs became something akin to the neck bolts in the classic Karloff version of Frankenstein; a part of the monster it was never entirely clear he needed but without which he wouldn't be the monster. Which is to say that PIs are an XML feature that some people cannot imagine living without and for which others cannot imagine ever having a good use.

David Orchard responded to XML-SW by saying that "[t]his is really great stuff. While I think that PIs should also be lopped off, and XInclude for entity replacement and an optional XML Schema validation level added, I can certainly live with this." But, Elliotte Rusty Harold responded, PIs can be very useful. Tim Berners-Lee took a sort of mixed approach, agreeing that PIs are useful, but suggesting they be excised nonetheless. "I feel they are harmful," he said, "because they bypass all the extensibility power one has with namespaces to make well-defined extensions. PIs also add a barnacle onto the XML syntax which it really doesn't need."

In response to the anti-PI clamor, Simon St. Laurent suggested that PIs could not be replaced by elements unless "you'd be willing to throw validation out". Further, St. Laurent added, "I would hope that the W3C would drop its continuing institutional animus against processing instructions. If there is a need to blast some bit of XML's SGML heritage as incompatible with the Web, may I suggest notations, unparsed entities, or both."

However syntactically inelegant or practically useful, there is some cost to removing PIs, just as there would have been some cost to the monster of removing the neck bolts, even if he no longer needed them. Dan Connolly makes exactly this point when he said that "the cost of keeping PIs is lower than the cost of getting rid of them...The cost of keeping PIs is no more than the cost of comments, as far as I can tell: one method in the SAX API, a few lines in the XPath spec, etc."

Norm Walsh, chair of DocBook's Technical Committee, offered a strikingly concrete example of the practical, everyday goodness of PIs, which really ought to be required reading for anyone who suggests they be removed.

What I really want here is, uh, how can I describe this? What I want is an instruction that I can insert into my document that will tell a particular processor that it should do something special. I want a, wait for it, a processing instruction! ...

The PI is entirely harmless (and invisible) to processors that don't care about it, but provides useful information for processors that go out of their way to look for it.

Lastly, Elliotte Rusty Harold suggested that PIs are an important part of the extensibility of XML documents. PIs are useful as a way to add processing "information to documents written in XML vocabularies we do not control and cannot change. Perhaps schema languages should be written in a more permissive fashion so that they automatically allow anything from other namespaces...[but] that is not how either DTDs or the W3C XML Schema Language is written." Which is a nicely abstract way of making the very concrete point Norm Walsh made.

Conclusion

As the discussion about processing instructions makes amply clear, the indispensably useful feature of one constituency is the unbearably ugly wart of another. What goes for PIs is sure to go for the DTD machinery Bray excised from his XML-SW and perhaps other things, to say nothing of all the other improvements which lurk in the minds of XML devotees.

The problem lies not in figuring out which XML 2.0 we'll end up with. The real problem lies in managing the process of getting from here to there, a process that shows every sign of being far more politically difficult than getting XML 1.0 was to begin with. What the XML world needs are proposals and thought experiments about what that process will look like, how it will be managed, whether the corporate entities that sponsor the W3C are willing to consider industry-wide and public goods in addition to institutional self-interest, and so on.