The State of XML - Part 2

June 16, 2000

Edd Dumbill

Escaping Point-to-Point

There are three things that to me seem truly important for escaping point-to-point XML communication. The first of these is interoperability: if you can't pre-determine your communication partner, then you need to be sure you can interoperate. As I've said before, XML still has a way to go on that one, and as users we need to make sure we're asking for interoperability.

The second key feature to me is discoverability. There need to be ways for programs to discover the existence of a remote service, as well as its capabilities. (For that matter, the human-readable Web still hasn't solved that problem to a great degree. I'm happy to see a lot of work happening to remedy that in evidence at this conference.)

A third key feature is notification. Being stateless and connectionless, the Web has no real notion of communicating state changes. Solutions to this problem on a point-to-point basis exist, but how do we handle distributed notification of state changes?

I hope by now your mind is full of possible solutions to these points I've raised. Please hold them in your mind, while I take a look at the other side of the coin, distributed data.

The Other Side -- Distributed Data

So how does distributed data differ from distributed applications? In this case, rather than traveling point-to-point in between applications, the data is distributed around the Web. The processing application is the one doing the traveling, roaming around the multiple resources that it needs to complete its tasks. An example would be an application connecting up travel information with hotel information and your personal schedule.

Some of the qualities of this application of XML:

  • Semantics are not hidden in the application logic: because a web site doesn't know who is going to process its data, all the required semantic information must be present in the XML data. This encourages openness and discourages lock-in -- not a property that the SOAP vision of the future has.

  • Not dependent on particular software : the flip-side of the above point. Anyone can write software to process an open data format.

  • Slower : it's currently the case that the development schedule might well be slower for applications working this way. Software support for this programming model is certainly more immature.

  • Paradigm difference : this "declarative" way of creating applications does not come naturally to a lot of programmers. Document-heads, librarians, and Prolog programmers may well find it the most natural thing in the world, but your imperative-language-focused Perl or VB hacker won't take naturally to doing things this way.

We Need Lots of Data

To prove this concept and model, we need the presence of a large amount of data on the Web. We also need a number of shared vocabularies with which to leverage our connections between applications. Developing vocabularies is hard, and promoting them even harder. While I applaud the work done by the Dublin Core committee, the average level of education of XML programmers on Dublin Core is pretty poor. Vocabularies remain a very difficult problem.

It's obvious that at the moment we're not at the HTML point. RDF, a W3C XML-based language that could achieve this vision of distributed data, has been around for a good while now and yet has still not caught on. There are examples, however, of this working.

One such example that we've developed at O'Reilly uses the RSS file format; this is a format (developed initially by Netscape) that allows sites to describe themselves and to maintain a list of "hot pages" on their web sites. By far the most common use of this has been as a lightweight syndication format for links to news stories on sites.

O'Reilly's Meerkat application is an aggregator for all this data (it has nearly two thousand sources now, I believe). You can search over all these stories, connecting them by your personal criteria (e-business, for instance), and use Meerkat to develop your own personal news service. Meerkat would be useless if it weren't for the feedback loop that caused web site owners to create XML-based metadata about their documents.

We're really waiting for more killer applications in the vein of Meerkat, and more visionary web site owners who can see the potential in exposing their services in the form of XML data sources.

When SOAP and RDF Meet

So, what happens when distributed application logic and distributed data meet up? One is in the ideal position to support the other. As I mentioned, distributed SOAP services suffer from a lack of discoverability, and a lack of ability to describe themselves. (SOAP description formats are currently in development to support description, but I think they run the risk of ignoring discoverability, as they seem mostly to exist to support programming rather than communication.)

RDF and other metadata interchange formats have the capability to create a decentralized directory service, connected by URIs over the Web. Perhaps one reason that people don't get the "aha!" on this yet is that we're still at the "CERN point." In the early days of the Web, Tim Berners-Lee and his colleagues at CERN were able to maintain a list of all web servers in existence on their page at CERN. This has since become impractical, and any attempts to create centralized directories are at best insufficient (think of the disappointment suffered by your average first-time user of a search engine).

Enterprise intranets, and the Internet as a whole, can still enumerate services without too much effort. There'll come a time when we won't be able to do that, and discoverability and description will play an important part in connecting autonomous applications.

Other Network Models

I'd like to briefly survey other network models that could well play a part in the world of distributed XML. One of these is the model employed by Gnutella and WorldOS. These technologies work by a network node chattering away to its neighbors on the network, answering what queries it can, and passing questions on to its neighbors.

Rather than having a central agent zipping around the Web connecting RDF files together, it might be interesting if every network node had a knowledge database, and answered queries from its nearest neighbors.

Another important network technology we don't really have in place with HTTP yet is notification. Push applications are still the "killer apps" for the Internet -- look at the popularity of e-mail and instant messaging clients. Yet there's no standard way for server-to-client or peer-to-peer notification of XML messages yet. I'm seeing some interesting technologies in this area appear, though, based around the premise that every node -- even a PC, or maybe in the future, your mobile phone -- runs an HTTP server. One project to look at in this area is Magi,


In conclusion, I will refer back to a theme that started with the very beginning of XML -- XML is itself an inherently disruptive technology. Everybody's world, and often their business models too, is being upset by XML. This is one reason why XML is such a great opportunity.

Although the core of XML is getting pretty stable as it makes its way up through established patterns of communication, things are getting messy again -- and they need to. The questions over SOAP and XML protocols are an example of this.

We still have some really hard problems to solve -- of which shared vocabularies are a particular problem. This is basically just going to take a lot of effort. When they're done, the effect will be remarkable. It's not just that point-to-point communication will be changed, but that there will be a network effect as XML is distributed over the Web.

The future of XML is a very challenging and exciting one!