The State of XML

April 21, 2004

Edd Dumbill

Editor's Note: This article is based on the closing keynote speech that Edd Dumbill delivered to the XML Europe 2004 conference in Amsterdam.

It is four years since I first presumed to summarize the state of the world of XML, for the closing keynote of XML Europe in 2000. In preparation for this speech I read back through that earlier summary and was startled at its continuing relevancy, despite the ongoing development of the technology. One reason for this, of course, is that things always take much longer to change than you think. We probably have the speed of the Internet and TV to blame for that perception. For a technology to get real adoption over the Web, a five-year wait is a reasonable minimum expectation. Despite that, some of the lasting themes are also inherent to XML and its community. I'll reprise some of them later.

In my first summary I noted that XML was a disruptive technology, fostering change in many of the areas to which it was applied. Now, it's fair to say that XML is an essential technology. We've moved from "Why are you using XML?" to "Why aren't you using XML?" XML support in all major programming languages is good, and is furthermore generating innovation in XML APIs. The need for interoperability and adherence to basic standards is now accepted, for the time being at least. The state of XML, at least from a distance, is good.

However, it's healthy to continually challenge the status quo. If XML is what the emperor's wearing today, we owe it to ourselves to look for where it's ill-fitting or threadbare. As XML was itself disruptive, we must continue to be alert for the next disruption.

For me to cover every place where XML is used is now more or less impossible. Additionally, it would be quite dull! Instead I'll pick on some important and favorite topics. Change is happening at these points and will over time radiate out toward the more complex or specialized uses.

Refinement of the XML Core

For awhile it seemed to those who cared about fixing the problems at the most basic levels of XML that the W3C had lost interest in pursuing this. Being a vendor-led consortium, they go where the money pushes them. Some areas remained unaddressed. More charitably, you could contend the XML's core needed to stand still in order that the necessary innovation could happen on top of it. Perfecting prematurely has killed off many a technology.

One of the unaddressed issues that was particularly important to me was the "packaging" issue: How do you related a document, its stylesheet, accompanying resources, and schemas in a way that's better than a mish-mash of processing instructions and attributes? Happily, work looks like it's getting under way in the form of defining the XML processing model. Another such issue being addressed is that of identifiers: many technologies such as CSS depend on finding the ID of an element. XML had no document-type independent way of doing this. The xml:id draft provides it.

I'm glad to see this renewed activity at the heart of XML. It may not be the most glamorous of activities, but addressing the lower levels is required to prevent a precarious tower of custom proprietary hacks introduced to solve the same problems with the consequent reduction in interoperability.

Standards Development Broadened from the W3C

XML standards development has been pursued in other forums than the W3C for some time. However, broadly successful specifications developed outside have been very few in number. For many companies, the W3C is perceived as the only game in town, and with reason.

There are both good and bad aspects to taking standards development outside of the W3C. The good side is the freedom to challenge the emperor's clothing, and specifically so in the face of baffling increases in complexity. The big triumph on this front has been the RELAX NG schema language. Its simplicity and ease of authoring have proved a compelling contrast to W3C XML Schema. (It's interesting that the commitment to manual authoring of schemas is so strong that RELAX NG has a non-XML syntax for convenience.) Various W3C working groups and even Microsoft are using RELAX NG internally, even if they have to convert to W3C XML Schema for interchange later. Incidentally the RELAX NG success can equally well be framed as a case of design-by-inspired-individuals vs. design-by-committee as much as it can be seen as a OASIS vs. W3C thing.

The bad aspect of operating outside the W3C is the loss of the coordination with web architecture and the time-tested processes. Even W3C specs aren't automatically blessed with success, and detached from that authority a technology really has to stand on its own feet. The higher-level web services specifications taken outside of the W3C are a disaster in the making, with no real underlying strategy or guarantee of longevity.

Divergence of Web Services

The area of web services is such a conundrum. In one way it represents many of the enjoyable aspects of XML: cross-system interoperability and the integration and composition of applications. In another way it represents much that is hateful: overblown hype, poor specification, and spiraling complexity. Either way, it now seems one step removed from XML itself.

Some exciting changes have happened with web services. The computer desktop is now less of an island than it used to be. The emergence of REST, and its successful deployment in Amazon and similar services. But it's also fair to say that the web services world has descended into necessary (for some) but dull complexity.

It seems that many of the components of distributed computing we already had with CORBA and similar technologies are simply coming to light again, wrapped in angle brackets. There is something deeply displeasing about the squandering of the possibilities XML offers simply to re-implement what existed before.

As a software developer I feel increasingly unhappy with the development of a monolithic mass of technology building up, only reasonably accessible behind a Java or .NET API. In contrast, the REST model of composed, simple interactions seems more controllable and containable and you can still see the angle brackets in order to check that things are working. There is still plenty of work and experimentation to be done yet with the notion of more document-oriented web services.

New Technical Problems to Solve

A lot of hard standards problems in XML have moved one level up, away from the basic mechanics of the technology. One of the hardest problems centers around metadata. Increasingly developers of systems are recognizing the importance of metadata. This isn't just in classical, large-data management situations. We each generate more and more data ourselves. At home: email, photographs, music collections. At work: email and electronic forms.

Consequently, even at the low level of operating systems vendors are seeing the need and advantages of implementing metadata storage and manipulation.

This is good. We have the tools to support this, whichever way you swing on the technology issues. RDF & OWL, Topic Maps, W3C XML Schema: all have the right machinery. Unfortunately that's not the biggest issue. The main problem is which terms, schemas, and ontologies to use. That's just not clear right now for most if not all metadata applications. At best, we'll get inconsistently classified information, which defeats the promise of interoperability. More typically, we'll end up with little tagged metadata and islands of de facto proprietary information.

So there are problems to solve here. We'll never get many globally consistent ontologies but we should try. At the same time we need the semantic equivalents of XSLT. And finally, we increasingly need to solve the user interface issues inherent in making use of our rich metadata.

New Constituencies

Among others, I observed some time ago that the platform on which XHTML will really come through is unlikely to be the desktop web browser. That former crucible of innovation has long since petrified. One of the largest applications driving XHTML is actually the mobile device area. Thankfully they're emerging from the specification and complexity disaster of WAP into using more conventional web standards. And on the other side, the W3C is taking on board the requirement for a binary serialization of XML: resource-scarce mobile applications genuinely do need this.

The petrification of the browser is a problem for those wanting to create modern web-connected user interfaces. As a result we're seeing the emergence of technologies such as Microsoft's XAML, a user interface markup language expressed in XML, into which code can be added in pretty much the same way it can in web pages. Given .NET languages' easy access to web services, this combination presents a real challenge to browser applications. This has a lot of vendors concerned, from both the proprietary and open source worlds.

Software Patents and Copyright Legislation

I won't go into great detail on the issues that intellectual property law raises increasingly in our field. Numerous others know more and express themselves more eloquently. But it is worth emphasizing that we cannot afford to ignore these issues. Many of us have spent so long in the fields of the web and data processing that we can't actually see the massive change that's happening in society.

Information resources are now first class objects in our world, as much as streets, cars, and houses are. And just as streets, cars, and houses are regulated, so progressively is the world of information technology. Unfortunately, it seems that it's not always the benign parties that are influencing this increasing regulation.

The protection of copyright in creative works is a good thing, but the restriction of the freedom to use a means of expression is more troubling. We've seen this fought out multiple times already with the GIF and MP3 file formats. The downside of globally recognized schemas is the issue of control. In fact, neither is it very "web-like." There are some interesting avenues to pursue in seeing how this can be avoided via local expression of semantics, and translations at the point of interchange. It's not perfect, but it will work and may be preferable to heavy global regulation.

XML's Great Strengths Endure

I've noted a few of the new challenges and developments for XML. To conclude, I'd like to celebrate a few of XML's qualities that endure and make it the truly unique technology it is.

  • Intimate relationship with the network: XML's coexistence with the Web has unlocked a great deal of power in developing distributed applications. The simple concept of the URI, in particular, has facilitated both elegance and potential in XML applications.
  • Human readable and editable: This aspect of XML cannot be underestimated in importance. In 2003, over 80% of developers surveyed by used a text editor to create XML, even if they used another tool, too. The adoption of simple forms of documents on the Web, such as HTML and RSS, is a testament to this. A successful document type is manifestly a readable document type. Microsoft gets this one, too: look at their latest XML creations such as XAML and the WinFS metadata notations. They've come through the complexity fire back down to readable markup. (As an RDF fan, the realization of this truth causes me some pain. The way out is to stop thinking of RDF as an XML application, and look to easier syntaxes such as Turtle and N3.)
  • Dedicated individuals who keep the flame: Were it not for our XML heroes, we'd be mired in second-rate, committee-led technology. The number of those on this list is continually growing, and we ought to be thankful for them.

Finally, and in summary of all XML's good qualities, I want to note its unique conjunction of best practice from multiple disciplines. Where else do computing, documentation, database, library, multimedia, and mathematics meet? XML is a remarkable conjunction of different disciplines, and its strength have been in meeting most of the requirements upon it as well as it can, while adding the flavor of the internationalized Web to them all. It is in this pervasiveness that XML has been truly revolutionary.