The State of XML

June 16, 2000

Nothing Too Ambitious

Attempting to describe the state of XML is an ambitious undertaking, and my paper in the proceedings covers a lot more than I'm able to say today. In particular, I wanted to bring out some interesting developments in XML that will be impacting us over the next few years.

I will, however, say a few words about XML's past development, and what's going on at the moment.

The Past

XML was, and continues to be, a revolutionary technology. Tim Bray likes to say that XML came in "fast and low and under the radar," taking the world by storm. Since then we've seen it explode from a minority interest to a pervasive technology.

However, since those first days, very little has come in "fast and low," and the road forward has involved some very hard work from those sitting on standards committees -- to which I pay sincere public tribute. Real successes of the last few years include XSLT and the XML DOM. Other specifications have been more frustrating in their pace of development -- XML Schemas come to mind here.

It's definitely worth analyzing why the really successful specifications have worked. I'm convinced that a key part of it was in open source implementations, developed in parallel with the specifications. (A couple of days ago I heard Brad Husick talking about the CPExchange group, and mentioning that part of their deliverables was an open-source reference implementation, something I heartily applaud.)

At some point we need to proclaim that core development on XML is at an end. Nothing has had quite the impact of XML 1.0, and one feels that W3C development of XML specifications is gradually getting mired as the requirements of XML users grow disparate, and the financial interests in XML grow larger.

The Present: XML Is About Money

Now that XML has hit the big-time, there's a lot of money flowing into it, and into companies basing their businesses on it. This is a good thing, and testament to the value of an interoperable syntax. However, we need to see interoperability going all the way, and XML being more than a "buzz-word" to gain acceptance for a product.

Users need to realize that just because a product uses XML, it is no guarantee of either interoperability or openness. You may be aware that on XML.com we have conducted XML 1.0 conformance tests using the suites developed by OASIS and the US NIST. In our last round of tests we found the big vendors -- Oracle and Microsoft -- lagging a significant amount behind various open-source XML parsers.

The inference here is obvious -- vendors with a platform lock-in don't view interoperability with as much importance. As users though, we need to demand it. Poor interoperability even at the most basic level limits who you will be able to do business with, and increases your costs of communications. Until we really start demanding conformance, we won't get it.

I welcome the continuing work by OASIS on conformance, and look forward to seeing the work from the XSLT and XML Schema conformance groups.

Yet the onus isn't just on the vendors. If standards bodies create specifications too impenetrable or complex to completely implement, there's little chance of getting true interoperability as the cost of implementation will be too high for vendors. There seems a real risk of this happening for XML Schemas. Standards bodies must ensure their work is usable by implementers, or it defeats their purpose.

Another observation I'd like to make on the use of XML today has to do with its relationship to the network and communication. The majority of XML transfers are point to point, under known contracts. This becomes important to recognize in the context of the future.

The Future: XML = Boring?

We want the XML of the future to be boring. It's true! What those building communication infrastructures on top of XML require is, above all, stability. XML's economic engine is becoming established, and more and more of the standardization activity is in applied horizontal standards (e.g., ebXML) and vertical industry-specific vocabularies.

To quote once more Tim Bray, "XML is the new ASCII" (or, as this is Europe, the new ISO Latin 1). I can testify to this personally -- as editor of XML.com I receive many news releases. For some, their only significance is that they use XML somewhere as a data format -- the product may be literally anything. Nobody expects to press-release saying they've used the ASCII character set -- this is why I am more than a little cynical about XML being used as a marketing buzzword.

Seriously though, we are reaching the point where the XML core needs to stop growing and developing. Some great things are happening in industry collaborations and vertical standards, and these need a stable base.

However, not everything is boring. I'd like to look at an alternate future, and some of the technologies I think will play an important part in the next phase of XML's life.

The Real Future: XML and the Network

XML belongs to the network

From its inception, XML has been inextricably tied with the network, and with the Internet in particular. Its close bond with the URI spec has made sure of that. However, at the moment XML is much more common inside Intranets and in point-to-point business relationships than as a format widespread on the Internet.

Fun things happen when it gets out of control

We've seen with HTML that when things get out of hand, we can have a lot of fun and a lot of innovation. Yes, it's horribly messy. I'm sure many SGML users here both rejoiced and were dismayed simultaneously at the success of HTML (not exactly the most elegant of SGML applications).

With XML on the Internet, we need to reach the "HTML point," where the rewards and complexity of use are sufficiently balanced to see widespread deployment. I don't think this means XHTML, initial indications aren't that optimistic for a quick uptake on XHTML, and the reward over using just normal HTML is not sufficient to cause a landslide of adoption.

What is really exciting me about XML at the moment is the prospect of decentralized, distributed XML. I want to examine first distributed applications that use XML as their connectors, and then move on to distributed XML data.

SOAP

You will all know that an XML application called SOAP is causing a lot of excitement and comment at the moment. Simple Object Access Protocol (developed initially by Microsoft, DevelopMentor, and UserLand Software, and since having added IBM to the list of authors) provides a protocol for routing data in between applications using XML.

Although it is the fashion among the cognoscenti to de-emphasize it, one of the most important features of SOAP is that it will allow remote invocation of program code using just HTTP and XML. This particular use of SOAP as RPC is anathema to long-standing and experienced developers of Internet protocols and distributed systems.

However, SOAP allows the desktop to escape on to the Internet. Application services that previously ran on your machine (a mail service, a directory service) can now be anywhere on the Internet. Applications have a Web-wide communication channel, without much overhead on their part.

This is both good, and bad. Good, as we will see more and more integration with the Web and XML from programs, and some interesting new ways of programming using the Internet. Bad, because there are associated confusions and risks with using SOAP as an RPC mechanism. One thing that worries me is that SOAP users won't design their messages independently from their code -- leading to lock-in, fragmentation, and semantic erosion. You lose your interoperability as soon as you let your SOAP messages mirror your internal data structures.

If SOAP takes off, though, I can't help feeling that the nay-sayers will be in the same position as SGMLers over HTML. It may suck, but it's doing something revolutionary.

SOAP & distributed applications

The SOAP future distributes logic around the Internet. This raises the possibility of not just open-source, but open-service. Applications will be able to make their functions freely available to callers. This could possibly be seen as a greater democratization of programs than open-source itself, as ultimately less technical ability may be required to use these open services, and the platform shouldn't matter. There are some interesting possibilities around using the equivalent of the Unix pipe to chain together and integrate disparate applications into one view for the user.

Yet this view of the future is still a little dull. It's moved the desktop playing field to the Internet, but it hasn't really escaped the point-to-point pre-determined modus operandi that we currently have. How do we reach something that's truly Web-like?

Escaping Point-to-Point

There are three things that to me seem truly important for escaping point-to-point XML communication. The first of these is interoperability: if you can't pre-determine your communication partner, then you need to be sure you can interoperate. As I've said before, XML still has a way to go on that one, and as users we need to make sure we're asking for interoperability.

The second key feature to me is discoverability. There need to be ways for programs to discover the existence of a remote service, as well as its capabilities. (For that matter, the human-readable Web still hasn't solved that problem to a great degree. I'm happy to see a lot of work happening to remedy that in evidence at this conference.)

A third key feature is notification. Being stateless and connectionless, the Web has no real notion of communicating state changes. Solutions to this problem on a point-to-point basis exist, but how do we handle distributed notification of state changes?

I hope by now your mind is full of possible solutions to these points I've raised. Please hold them in your mind, while I take a look at the other side of the coin, distributed data.

The Other Side -- Distributed Data

So how does distributed data differ from distributed applications? In this case, rather than traveling point-to-point in between applications, the data is distributed around the Web. The processing application is the one doing the traveling, roaming around the multiple resources that it needs to complete its tasks. An example would be an application connecting up travel information with hotel information and your personal schedule.

Some of the qualities of this application of XML:

Semantics are not hidden in the application logic: because a web site doesn't know who is going to process its data, all the required semantic information must be present in the XML data. This encourages openness and discourages lock-in -- not a property that the SOAP vision of the future has.
Not dependent on particular software : the flip-side of the above point. Anyone can write software to process an open data format.
Slower : it's currently the case that the development schedule might well be slower for applications working this way. Software support for this programming model is certainly more immature.
Paradigm difference : this "declarative" way of creating applications does not come naturally to a lot of programmers. Document-heads, librarians, and Prolog programmers may well find it the most natural thing in the world, but your imperative-language-focused Perl or VB hacker won't take naturally to doing things this way.

We Need Lots of Data

To prove this concept and model, we need the presence of a large amount of data on the Web. We also need a number of shared vocabularies with which to leverage our connections between applications. Developing vocabularies is hard, and promoting them even harder. While I applaud the work done by the Dublin Core committee, the average level of education of XML programmers on Dublin Core is pretty poor. Vocabularies remain a very difficult problem.

It's obvious that at the moment we're not at the HTML point. RDF, a W3C XML-based language that could achieve this vision of distributed data, has been around for a good while now and yet has still not caught on. There are examples, however, of this working.

One such example that we've developed at O'Reilly uses the RSS file format; this is a format (developed initially by Netscape) that allows sites to describe themselves and to maintain a list of "hot pages" on their web sites. By far the most common use of this has been as a lightweight syndication format for links to news stories on sites.

O'Reilly's Meerkat application is an aggregator for all this data (it has nearly two thousand sources now, I believe). You can search over all these stories, connecting them by your personal criteria (e-business, for instance), and use Meerkat to develop your own personal news service. Meerkat would be useless if it weren't for the feedback loop that caused web site owners to create XML-based metadata about their documents.

We're really waiting for more killer applications in the vein of Meerkat, and more visionary web site owners who can see the potential in exposing their services in the form of XML data sources.

When SOAP and RDF Meet

So, what happens when distributed application logic and distributed data meet up? One is in the ideal position to support the other. As I mentioned, distributed SOAP services suffer from a lack of discoverability, and a lack of ability to describe themselves. (SOAP description formats are currently in development to support description, but I think they run the risk of ignoring discoverability, as they seem mostly to exist to support programming rather than communication.)

RDF and other metadata interchange formats have the capability to create a decentralized directory service, connected by URIs over the Web. Perhaps one reason that people don't get the "aha!" on this yet is that we're still at the "CERN point." In the early days of the Web, Tim Berners-Lee and his colleagues at CERN were able to maintain a list of all web servers in existence on their page at CERN. This has since become impractical, and any attempts to create centralized directories are at best insufficient (think of the disappointment suffered by your average first-time user of a search engine).

Enterprise intranets, and the Internet as a whole, can still enumerate services without too much effort. There'll come a time when we won't be able to do that, and discoverability and description will play an important part in connecting autonomous applications.

Other Network Models

I'd like to briefly survey other network models that could well play a part in the world of distributed XML. One of these is the model employed by Gnutella and WorldOS. These technologies work by a network node chattering away to its neighbors on the network, answering what queries it can, and passing questions on to its neighbors.

Rather than having a central agent zipping around the Web connecting RDF files together, it might be interesting if every network node had a knowledge database, and answered queries from its nearest neighbors.

Another important network technology we don't really have in place with HTTP yet is notification. Push applications are still the "killer apps" for the Internet -- look at the popularity of e-mail and instant messaging clients. Yet there's no standard way for server-to-client or peer-to-peer notification of XML messages yet. I'm seeing some interesting technologies in this area appear, though, based around the premise that every node -- even a PC, or maybe in the future, your mobile phone -- runs an HTTP server. One project to look at in this area is Magi, http://magi.endeavors.org/.

Conclusion

In conclusion, I will refer back to a theme that started with the very beginning of XML -- XML is itself an inherently disruptive technology. Everybody's world, and often their business models too, is being upset by XML. This is one reason why XML is such a great opportunity.

Although the core of XML is getting pretty stable as it makes its way up through established patterns of communication, things are getting messy again -- and they need to. The questions over SOAP and XML protocols are an example of this.

We still have some really hard problems to solve -- of which shared vocabularies are a particular problem. This is basically just going to take a lot of effort. When they're done, the effect will be remarkable. It's not just that point-to-point communication will be changed, but that there will be a network effect as XML is distributed over the Web.

The future of XML is a very challenging and exciting one!