XML Ain't What It Used To Be

February 28, 2001

Simon St. Laurent

Current XML development at the W3C threatens to obliterate the original promise of XML -- a clean, cheap format for sharing information -- by piling on too many features and obscuring what XML does best. While users may demand some of those features for some applications, features for some users are turning into nightmares for others. Rather than creating modules users can apply when appropriate, the W3C is growing a jungle of specifications which intertwine, overlap, and get in the way of implementors and users.

More was Less, Less was More

XML got its start because SGML did too much. SGML provided significant flexibility, addressing the needs of a diverse user community by providing an astounding array of choices. The SGML declaration let users identify which choices had been made by the creators of a particular document, and a dazzling array of syntax variations were perfectly legal. SGML's focus on user needs produced a specification that was extremely difficult for developers to implement. SGML remained in its original niches, a tool for those organizations that could afford its substantial cost.

XML 1.0 reduced flexibility and showed the world that markup could be powerful and easy to implement. XML's creators explicitly aimed at the 80/20 mark, providing 80% of the capability with only 20% of the complexity. "Less is more" is a fundamental theme in most tellings of XML 1.0's creation.

XML exploded across the computing landscape. It made markup a credible and even elegant solution to a wide variety of problems. Developers took on the mild challenge of producing XML parsers often enough that there were lots of them, mostly cheap or free. Other developers took that work and integrated it with their own applications, taking advantage of free tools, open formats, and what looked like free interoperability.

More May Yet Be Less

Various W3C activities seem to be converting XML documents from labeled, structured content to labeled, structured, and typed content. The primary mechanism for performing this transformation is the W3C XML Schema Definition Language, the most complex and controversial of all of the XML specifications, and the only one that's generated credible competition hosted at other organizations (RELAX through ISO, TREX through OASIS). Widespread grumbling about W3C XML Schemas is a constant feature of the XML landscape, with no sign of fading.

The release of the Requirements for both XSLT 2.0 and XPath 2.0 suggest that the W3C plans to drive W3C XML Schema technologies deeply into the rest of XML. The requirements describe operations which both require a "post-schema validation infoset" (PSVI) and depend on parts of the W3C XML Schema spec, like the regular expression syntax defined in Appendix E of XML Schema: Datatypes.

This interweaving of specifications has a number of consequences. First, it raises the bar yet again for developers creating XML tools. While borrowing across specifications may reduce some duplication, it also requires developers to interpret tools in new contexts. (As the recent XPointer draft demonstrates, there can be unexpected consequences.) Developers with existing code bases now have to teach that code about complex types. Since none of these documents offer conformant subsets, they have to be swallowed in large chunks.

During a 'Schema Schism' panel at XMLDevCon 2000 last fall, Henry Thompson suggested that the complexity of W3C XML Schema would fall on a few implementors writing validators, and that developers could simply rely on that work. It isn't clear, however, that document validation is the primary purpose of many schemas, or that validators are a useful tool for developers who want to do other things with schemas. In a large number of cases, complexity is limiting.

For users, there are a number of painful side-effects to the growing number of features. The learning curve for XML is growing rapidly, and even XML "experts" can no longer keep track of every specification. Monitoring XML's growth is a full-time job. 1200 pages is no longer enough to describe the XML family of specifications in any kind of depth, even without getting into best practices.

Learning from the Initial XML Experience

One of the key lessons that XML's "founders" seemed to have learned from SGML is that large lists of features can prevent a standard from being widely adopted. This challenges the conventional wisdom that users choose tools based on features, but it certainly worked for XML. Microsoft, Sun, IBM, and Oracle were able to apply XML to their problems, while open source tools gave developers many more options and lower price tags. XML was simple, cheap, and easy enough to apply almost anywhere.

XML's simplicity didn't resolve underlying processing issues, but it helped developers with some problems while adding few new ones. XML didn't make computers magically talk to each other, but it made it easier to share information between computers, especially when information had to cross system or environment boundaries. Java servlets, COBOL programs, and cell phones could work in the same framework.

Performing complex tasks often requires complex tools. That doesn't mean that developers performing relatively simple tasks want all the power available in the complex tools, and those power tools can actually hinder simple projects. The easiest route to information sharing often involves removing, rather than adding, options. Every added feature is a future added negotiation. The underlying problem of information description is complicated enough already -- why pile on more levels of discussion just to share a common view of a document?

Moving forward

It's easy to complain about the W3C, but it's clear that it has a difficult job -- balancing a vision of the Web with member needs is hard under the best of circumstances, but it's unquestionably difficult when a gold rush sets in.

There are, however, a number of things the W3C could do, including simply doing less. "Internet time" is fast enough; "XML time" seems to be even faster. Doing less and doing it more slowly might give development communities time to provide meaningful feedback. Allowing specifications to mature through implementation before layering on top them would reduce the risk of building on uncured, uncertain foundations.

Similarly, presenting W3C output in smaller pieces, deliberately fragmenting specifications into useful atoms, would help developers absorb and implement the standards. Smaller standards are easier to implement and integrate.

Current W3C activity suggests that this is not the route its members want to follow. Given the rate at which XML, interpreted as "the XML family of standards" is growing, developers and vendors might want to consider another option. The W3C claims that it creates Recommendations, not standards, and is set up to a significant extent as a research lab. Taking that seriously means that developers should treat everything the W3C is doing as experimental and potentially unstable, worthy of careful evaluation rather than strict observance.

Perhaps someday, as the realization hits that XML has grown every bit as complex as its parent, and just as limited, a new group of developers will sort through the Recommendations and other practices, figure out what is most worthwhile, and take new aim at that 80/20 mark.