Menu

All Roads Lead to RDF

August 11, 2004

Edd Dumbill

August. Season of long, hot, lazy days, and not much else. While the vacation of all things XML continues, those eager RDF folks are still hard at work. Consequently, the XML-Deviant this week focuses largely on RDF-related topics.

RDF: The Natural Conclusion of Web Services?

For my main topic this week I am reaching again into the world of weblogs. Web services have never made for great mailing list discussions, but there are often quite thoughtful pieces to be found on the topic on the Web. One of the more prolific writers on web services has been Mark Nottingham. He is currently employed by BEA and has been an active figure in the development of web services specifications.

On his weblog, Nottingham has been pursuing an occasional series called "XML Heresies," and it is his latest installment that drew my attention this week. In The 'Document' in Document-Oriented Messaging he discusses exactly what is inside the payload of a web service.

As Nottingham points out, document-oriented messaging is characterized by specifying a protocol in terms of what goes over the wire, rather than the code that handles the protocol. These days, that content is likely to be in XML. The next natural step then is to look for a way of constraining these messages. Nottingham highlights EBNF, DTDs, W3C XML Schema, Relax NG, and OWL as all being suitable technologies.

Nottingham then raises the question that concerns the current state of web service specification:

The whole idea of web services is to give people a protocol construction toolkit that allows them to easily specify messages, suck them into code, and start working with them easily, on most any platform. So, why is it then that web services went shopping for these things and came back home with the XML Infoset as described by XML Schema, of all things?

Now, aside from general aesthetic concerns, what's the problem with using Infoset and Schema? There's simply too much in there, says Nottingham.

Think of it from an information theoretic standpoint; if the various Information Items and properties of an Infoset are each capable of carrying information, we've got a pretty big footprint to work with, and Schema doesn't give very precise tools for sorting the signal from the noise. Because each different tool chooses a different, incomplete portion of the Infoset to model, interoperability is hard.

Also in XML-Deviant

The More Things Change

Agile XML

Composition

Apple Watch

Life After Ajax?

Nottingham observes that both Atom and WSDL 2.0 eschew W3C XML Schema for describing their wire formats. The latter is particularly ironic, of course, given the general expectation on web services developers to use W3C XML Schema.

Something more than W3C XML Schema is needed, and something that addresses data modeling rather than merely syntax constraints. Nottingham writes:

Don't get me wrong; XML is a great foundation for syntax, but data models that directly map to it (such as the Infoset, PSVI, XQDM, etc.) are a horrible basis for a generic, interoperable protocol toolkit...

...The real trick, IMO, is getting the advantages of XML -- like platform neutrality, versioning, extensibility, nested data structures, self-description, and human readability -- without the complexity of the Infoset or the problems of XML Schema. A simpler, higher-level data model that has a mapping onto the Infoset while still providing these things could do the job.

Nottingham's next proposal is the one that should surprise most followers of web services. He points out there are two ways forward. The first way is to subset W3C XML Schema, an approach that the WS-I group seems to have started. The second way is to start over. But reinventing from the ground up might not be necessary:

...we might be able to just switch horses. A little while back, I made a direct comparison between the two stacks that the W3C is developing; one based on the Infoset, the other on the RDF data model. It's pretty clear to me that the RDF data model is simpler; the next step, I think, is to see if and how it (along with OWL) provides the purported benefits of XML, such as nesting, extensibility, and versioning. The first of these is pretty easy (it's a directed graph, so it's arguably superior); the latter two are beginning to be explored. Stay tuned.

Wow. However, this actually seems to make a lot of sense. Nottingham's notion perhaps may not be too surprising to XML document-heads who wondered at the bizarre monster that is W3C XML Schema, nor to the semantic webbers who have marveled at the rush to cram all data into XML's tree-shaped structures.

So what reaction did Nottingham's piece receive? James Tauber agrees, and sets out a bulleted list of his beliefs of where XML and RDF stand in relation to each other. Interesting in particular are his conclusions:

I therefore believe that when one develops a vocabulary (or "application" in the SGML sense of the term) it should include:

  • a schema for the RDF in something like OWL
  • a schema for the XML in something like RELAX NG
  • a mapping between the two (and RELAX NG should support inclusion of this mapping)

Back on Nottingham's site, Randy Charles Morin speaks up for W3C XML Schema:

XML Schema hits the 80-20 mark. End of story.

Nottingham politely rebuts Morin:

Sorry, I need more convincing than you saying it's good enough. Lots of people -- including myself -- have done the work and found XML Schema lacking, so much so that they're looking for something better.

Sean McGrath, who's not without a good deal of web-services integration experience, is a little less reserved.

W3C XML Schema hits the 80/20 mark for schema languages the same way that a boiled egg hits the 80/20 mark for a balanced diet ...

If you want to see what a real 80/20 point looks like in the field of schema languages, look at Relax NG.

Analysing the problem in a broader scale, Bill de hOra doesn't think there is a silver bullet -- the real interoperability issues in web services are to do with humans -- and says the bigger benefits in RDF lie elsewhere:

RDF technologies will be useful insofar as they'll help drive the interop problem up the stack. But there will continue to be an interop problem since people won't even agree on vocabulary, never mind semantics.

But here's the thing -- RDF versus XML, or RDF as some kind of surrogate for XML, are xml-dev permathreads that must die. Really where RDF could have significant impact is not swapping out the XML stack, but in the business logic and mapping rules we're been busy embedding into in-systems programming languages for the last two decades -- in that sense it aligns nicely with data-directed languages like Schematron, SQL, and from way back Prolog (before it got tarnished with the AI brush).

I've not had room here to include the whole text of all the contributions, so I do recommend reading the pages concerned. It's good to know that there is still lots of debate and questioning taking place in web services.

The Growth of FOAF

The Friend-of-a-friend (FOAF) project is an RDF vocabulary for creating machine-readable home pages. (Read Leigh Dodds' Introduction to FOAF on XML.com.) Created by Dan Brickley and Libby Miller, it has proved among other things a useful testing ground for ideas and software for the Semantic Web. As a center of development FOAF is coming of age this year by hosting two events for those working on or around the project.

The first of these is FOAF Camp, an informal gathering in the Netherlands on Aug. 19-20. Conducted in a self-organizing way similar to O'Reilly's FooCamp, FOAFcamp promises to be a forum for fun and creative discussion about FOAF.

The second event is more formal, taking the shape of an academic workshop. The 1st Workshop on Friend of a Friend, Social Networking and the Semantic Web is part of the Semantic Web Advanced Development Europe initiative. It will be held Sept. 1-2 in Galway, Ireland. I've been fortunate enough to be on the program committee for this workshop, and can predict a rich and valuable meeting.

Births, Deaths, Marriages

The latest announcements from XML-DEV.

XML 2004 Program Released

Find out who's speaking at the main USA event for the XML community. New format places tutorials on either side of the conference, rather than in the preceding two days. Call for participation is still open for late-breaking news presentations, and sponsorship and exhibitors. XML 2004 takes place in Washington DC, at the Marriott Wardman Park Hotel, Nov. 15-19.

Stylus Studio 5 Home Edition

Not entirely sure who wants to take an XML IDE home. "Stylus Studio 5 Home Edition is specifically designed for learning or working with XML in educational, training, or home settings, and is available now for only $49 (USD) for a single-user license." Look out for forthcoming Barbie Edition.

Expat 1.95.8 Released

New release of venerable XML parser adds "suspendability." According to the announcement, this "allows for parsing a document in chunks without having to use a separate thread, and it makes it also possible to build a pull parser on top of Expat."

Nature RSS Newsfeeds

Nature Publishing Group embrace RSS 1.0 in style, adding in metadata from the PRISM standard.

Scrapings

Soundtrack of their lives: it's fun to program in XSLT!... fancy a 3,000-message flame war?... Messages to XML-DEV this week: 40 (vacation season), Len rating 7.5% (blame the blog) ... humor prematurely curtailed due to vacation. I'm off to Jordan for a week, I hope you enjoy your travels, too.