RDF, What's It Good For?
November 13, 2002
The Family Eccentric
RDF is like my eccentric old uncle. I don't know him as well as I'd like, which is partly his fault, since his eccentricities can be off-putting. Of course they're what make him so interesting and are the reason I want to get to know him better in the first place. Yeah, RDF is just like that.
The Resource Description Framework is still among the most interesting of W3C technologies. But it's got persistent troubles, including having had its reputation beaten up unfairly as a result of the many and often nasty fights about RSS. But, just like my eccentric old uncle, RDF is not entirely blameless. In a previous XML-Deviant article ("Go Tell It On the Mountain") I argued that RDF's trouble might have something to do with it having been the victim of poor technical evangelism.
In some sense that's still true. Recently I googled for a comprehensive, up-to-date RDF tutorial, which proved as elusive as finding Uncle's dentures the morning after nickel beer night at the bingo hall. In fact, I was hard pressed to find an RDF tutorial which looked like it had been updated this year. And one which I did find simply listed 13 different ways to express the same basic set of assertions, which not only makes a terrible tutorial, but also exemplifies another of RDF's persistent troubles: its XML serialization.
During the time I've tracked RDF in the XML community, I can't recall running across even one enthusiastic defender of RDF's XML serialization. Apparently everyone, or so it seems, thinks it's a nasty kludge at best. Now, I've been using RDF in some of my recent Python programming, using Daniel Krech's excellent rdflib (which, as Andrew Kuchling reminded me, thanks to its new ZODB/ZEO storage layer, now does fully-distributed storage.) One virtue of rdflib is that it shields me, the carefree application hacker, from having to deal with RDF's XML serialization. I never think about it or about its warts. I rarely even see it. Which is perfect. As long as, when I send the XML-serialized dump of my RDF triple store to someone else, they end up with the same graph of assertions, I'm happy.
But everyone's needs are not as easy to satisfy, I suspect. For my recent apps (a "knowledge base" of persistent news URLs used to generate a static Web site; and a URL-annotating IRC bot) RDF is the thing: the XML serialization is simply a way I can share the RDF assertions with very little pain (though, perhaps, with more pain than shipping n-triples around, though that's unclear and likely moot).
However, some applications require that RDF be embedded in XML, often in an extant XML language. This is one reason why in these pages two weeks ago John Cowan and Bob DuCharme, in an article called "Make Your XML RDF-Friendly", offered some tips for arranging your XML so that it's more, rather than less likely for RDF processors to be able to make sense of it.
RDF: Mundane Metadata and a Relational Model Alternative
I won't discuss Cowan and DuCharme's suggestions, but I will review XML-DEV's reaction to them. Their article offered 8 ideas, which I paraphrase thus:
- Every element should belong to a namespace
- Use rdf:ID, not ID attributes
- Put the URI of a described resource in a rdf:about attribute
- Put the URI of a referenced resource in an empty element's rdf:resource attribute
- Use URIs from existing ontologies
- Take care with containers
- Avoid mixed content
- Check assertions with an RDF processor
As is often the case on XML-DEV, a post by Simon St. Laurent initiated a wide-ranging and often helpful conversation -- this time St. Laurent suggested that Cowan and DuCharme's rendering of XML amenable to RDF was, among other things, overly intrusive: "I can't imagine," St. Laurent confessed, "telling XML vocabulary developers to do those things while keeping a straight face". He further suggested that an approach that extracted RDF triples from XML by way of, presumably, XSLT transformations might make more sense: "At some point it seems that it makes a lot more sense for ordinary mortals to work in XML and let geniuses write transformations if they want to reuse the information in RDF processing. Creating markup in a straitjacket can be a lot of fun, but only if you're genuinely fond of the straitjacket".
St. Laurent seemed especially critical of Cowan and DuCharme's No. 7, about avoiding mixed content. As Simon said, "'Eschew mixed content' seemed the most ridiculous (and memorable) at the time, and I'd been having particular annoyances with general failures to appreciate mixed content at that point...Looking at the whole project in more detail and with examples, I find the whole thing repulsive, at least when taken as an approach to creating XML generally. On the other hand, as a human-readable syntax for RDF, it's far better than anything else I've seen".
Bob DuCharme responded to St. Laurent's comments by pointing out that sometimes the expressiveness of mixed content is outweighed by the pain of processing it. DuCharme added that "[w]ith all the additional constraints of RDF-conformant XML, it's even less expressive, and often even easier to process, so it's well-suited to certain applications". One of those applications is metadata, where RDF is having considerable success, particularly among the library and information science communities (understandable, since metadata was one of its first intended uses.) This fails to count as "real world applications" only for those who are blinded by corporate IT, and only insofar as they haven't had to implement a heterogeneous document repository or knowledge management application.
"I still find it a little ironic," DuCharme said, "that while RDF has gotten so much publicity as a technology for warm and fuzzy AI pie-in-the-sky technology, it's gotten most of its traction in the mundane world of metadata".
Adam Turoff's reaction was the opposite of St. Laurent's, praising the practicality of Cowan & DuCharme's suggestions, rather than condemning their specifics: "I like this article because it...discuss markup design issues for people who want to make their vocabularies RDF-friendly". And, Turoff added, there is very little of that sort of practical design advice around. "Instead, we are left with," he said, "a hodgepodge of vocabularies where the primary design goal is, for example, mimicking a particular database structure, not vocabularies where the primary design goals are to be used as XML files per se".
Stepping back from the presuppositions of Cowan and DuCharme's article, Mike Champion cast doubt on RDF itself, suggesting that he didn't see much interest in it, that resistance to it during the RSS debates evinced the general lack of interest. "I would really like to understand," Champion said, "what benefit one might really get from using an RDF-friendly XML syntax...I'm not hostile to RDF...just skeptical that it's worth a significant investment of my time".
Danny Ayers tried to answer some of Champion's questions, pointing out that "RDF allows extensibility with minimal extra work". Ayers offered several projects where RDF is being used, including dmoz.org, the Mozilla browser, Adobe's RDF metadata initiative, the unfairly maligned RSS 1.0, the Stanford TAP project, MusicBrainz, Mitch Kapor's vaporware, open source PIM. I'll add two other projects, off the top of my head: MIT's DSpace digital repository and, since I mentioned him already, Andrew Kuchling's Biographical RDF.
Responding to both St. Laurent's claim about straitjackets and to Champion's plea for a demonstration of RDF's utility, Eric van der Vlist said that lots of things -- like RDBMS and XML -- are straitjackets, that every storage or representation technology has advantages and disadvantages, including RDF. "RDF and its triples," van der Vlist claimed, are "really lightweight when you have the right tools to manipulate them. I like to think of them as a RDBMS with a variable geometry: each 'row'...can have a variable number of columns..."
Also in XML-Deviant
Van der Vlist makes nicely the point I made earlier about Python's rdflib. Being able to use RDF as a loose storage system, without having to worry about outgrowing (or even fully specifying, in advance) an RDBMS schema can be very helpful, in at least two situations: because, first, you don't know what the data schema really is yet, owing either to problem domain constraints or to an extended prototype stage; and, second, because in some applications the storage schema needs to stay very flexible and extensible for the lifetime of the project. Or, as van der Vlist said, RDF is "like a RDBMS which you could populate before having written any schema, that's really very flexible..."
In next week's XML-Deviant column I'll continue to look at RDF, its beauty marks and its warts, what it's good and bad at. In particular I'll describe Tim Bray's proposal for a new, simplified XML serialization of RDF graphs. Just like everyone's eccentric old uncle, we may discover, once we get past all the blemishes and oddities, that RDF has more going for it than it often seems.