Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

For as long as RDF has existed, people have been trying to fix it. My predecessor in this spot, Leigh Dodds, wrote a column in the summer of 2000 ("Instant RDF") in which he discussed efforts to respond to complaints about RDF's complexity. At that relatively early point, the two dominant approaches to relating XML and RDF, as Dodds explained, were that RDF should be embedded in XML documents or that RDF should be extracted from, but not embedded in, XML documents.

In last week's column, I claimed, following conversations in the XML development community, that RDF was good for representing "mundane metadata", to use Bob DuCharme's phrase, and as an alternative to RDBMS storage. That is, as a kind of unstructured or semistructured data storage model. My goal was to route around complaints about RDF's XML serialization by suggesting ways in which it didn't matter (not much, anyway) what that serialization looked liked, since the goal was to avoid writing it by hand or reading it, as it were, by eye.

I suggested using a programmatic triple or RDF store from a host programming language, many of which have interesting RDF triplestore implementations (for example, Redland works with several languages). By means of a triplestore API one makes 3-tupled assertions, combining them into graphs, using ontologies (of various degrees of formality and publicity) of terms, predicates, both of which are named by URIs, and values, which may be named by URIs or may be asserted literally.

In this scenario some of the constraints, but also most of the maturity, performance, and wider tool support, of SQL and RDBMSes are avoided in return for a considerable grant of flexibility and extensibility. And if the XML serialization of these graphs of triples, which might be used for exchanging graphs or simply for on-disk storage, was terribly ugly or hard for most people to write and read, who cares? No one is being asked to do so. Except for the people who develop the triplestore implementations, but they're RDF theoretic model wireheads anyway. If you're troubled by the idea that some things are simply to be ignored by some people, think of an RDBMS like MySQL, which is widely and successfully used by thousands of developers, most of whom haven't the slightest idea about the technical details of, say, ISAM table storage. They don't know; don't want, care, or need to know. Perhaps RDF's XML serialization is like that?

In other words, if you don't like or understand or prefer RDF's XML serialization, find a way to avoid dealing with it directly. Using an RDF triplestore from a high-level language is one such way, while retaining some, perhaps all of the benefits of RDF's data model. So, my argument is a more focused variant of the suggestion Shelley Powers has been making repeatedly on XML-DEV lately: if you don't like or understand or prefer RDF, just don't use it. This seems fair enough.

Most recent discussion of RDF, which has bubbled over the bounds of XML-DEV and moved out into the broader confines of the Web development community, has been by turns absurd and sublime. From foundational debates about whether RDF is complex, or fights over how to characterize its complexity, to awfully redundant discussions about whether its XML serialization is all that user-unfriendly, to meta-debates in which various sides jockey for position to see which side can be described as unfair or "politically correct" (whatever that could possibly mean in this context) or dismissive or narrow-minded or high-handed -- and on and on.

Yet the debate has also been productive at times, including Tim Bray's RPV proposal.

Resources, Properties, and Values

Bray says his RPV proposal "is an XML-based language for expressing RDF assertions ... designed to be entirely unambiguous and highly human-readable." That two-part design goal is worth spending some time with insofar as it's emblematic of a good deal of the underlying debate over RDF. To say that an XML language is or should be "entirely unambiguous" and "highly human-readable" is to say that it should be as easily digestible by machines as by humans. It's that tension which runs all the way from XML to RDF.

Further, Bray suggests that RDF has failed to gain traction because of this tension: his RPV proposal "is motivated by a belief that RDF's problems are rooted at least in part in its syntax." He elaborates on this point by saying, first, that RDF's XML serialization is "scrambled and arcane," preventing people from easily reading or writing it; second, that the XML serialization uses qualified names in a way that's not user-friendly and is in some conflict with the TAG's idea that significant resources be identified by URI; third, that there doesn't seem to be a general problem for metadata folks to think of things in terms of RDF's 3-tuples; fourth, that some alternatives to RDF-XML, like n3, suffer because, as non-XML, they can't get the network effect of ubiquitous XML support; and, fifth, that the idea of embedding RDF in XML languages, which seemed in the summer of 2000, both to Leigh Dodds and much of the rest of the XML development community, like a viable approach, "has failed resoundingly in the marketplace."

To put it more plainly: RDF needs a new XML serialization as the existing one is overly complex, and it should be possible to do better. Bray's RPV proposal has at least one immediate virtue: simplicity. It contains only two elements, R and PV -- for resources and property-value pairs, respectively. Which means simple triple in RPV can be as straightforward as

<R r="http://xml.com/">
  <PV p="http://foo.com/#siteType" v="http://foo.com/#xml" />
</R>

The resource identified by the R element has the property identified by the URI in PV's p attribute, which has the value identified by the URI in its v attribute. Since there can be any number of PVs within an R, one can easily add other properties to the resource by adding other PV elements. As the object of a property can also be a literal, RPV says that when the v attribute is missing from a PV, the value of the property being predicated of the resource is the content of the PV element:

<R r="http://monkeyfist.com/">
  <PV p="http://foo.com/#Title">Our Monkey, Your Fist</PV>
</R>

An attributeless R means that the element itself is (or represents) the resource being described:

<R>
  <PV p="http://foo.com/#Type" v="http://foo.com/#Resource" />
</R>

A resource element with an id attribute, the value of which must be unique within the XML document can be referred to at other points in the document:

<R r="http://monkeyfist.com/" id="r1">
  <PV p="http://foo.com/#Publisher">Monkeyfist Collective</PV>
</R>

<R r="#r1">
  <PV p="http://foo.com/#Subject">politics</PV>
</R>

That's about all there is to RPV (save for namespaces, which I've omitted above, and some bits about relative URIs and reification). RDF-RPV is clear and simple, easy to write and read; more importantly, it makes the triples plainly visible. The murkiness of the triples is one complaint people often make about RDF-XML.

Also in XML-Deviant

The More Things Change

Agile XML

Composition

Apple Watch

Life After Ajax?

Whether the RDF Working Group will consider alternative syntaxes or whether something like RPV could possible be adopted remains open questions. The value of Bray's RPV proposal is its demonstration that an XML serialization of the RDF model does not have to be complex or hard for humans to read.

One of the parts of RDF which people seem to like is the clarity of tuples of subjects (resources), predicates (properties), and objects (values). The 3-tuple isn't ideal for every situation and, yes, some people aren't interested in thinking of things in terms of graphs of triples. For those who do, however, having an XML serialization of RDF which makes the triples obvious and plain seems to be an unambiguously good thing.


Comment on this articleWhat do you think of Tim Bray's RPV proposal? Is another syntax enough to fix up RDF in the eyes of its doubters?
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Oldest First
  • RPV comments
    2002-11-21 02:07:00 Ian Young [Reply]

    I'm not sure we should be commenting on something that hasn't even had the obvious typos fixed yet.


    Nonetheless...
    RPV as currently described lacks a way of referencing blank nodes in the serialization. Some mechanism like RDF/XML's rdf:nodeID is needed.


    You give an example of an <R> element with both id and r attributes. I think this is a misunderstanding: it only makes sense for one or the other to be given.
    In fact, since the id attribute is equivalent to RDF/XML's rdf:ID and the r attribute to rdf:about, only the r attribute is required in RPV --there is no difference in semantics between <R id="name"> and <R r="#name">


    More work is needed on typed literals, particularly XML literals.


    Section 4 should be dropped. It is not necessary to introduce new syntax to encode RDF statements that introduce a 'reification'. At present, this section is also incorrect in that it requires the statement being reified to be asserted as well as its reification.


    • RPV comments
      2002-11-21 14:31:18 Kendall Clark [Reply]

      I trust you've sent these comments to Tim Bray? I'm not sure I agree about the r and id attributes of the R element, but no matter.


      I think you may have misunderstood the point of the XML-Deviant column, which is precisely to comment upon stuff that's being actively worked on and developed. Note that I didn't tell anyone to go out and bet the farm on RPV, merely that some parts of the recent RDF donnybrook have been productive, unlike most of the rest of it.

  • Looks Good
    2002-11-21 12:24:20 Doug Ransom [Reply]

    I like it. I get it. I don't feel dumb trying to read it, and I might be able to deal with it in xslt.

  • interesting
    2002-11-21 18:05:06 Fred Grott [Reply]

    Having delved into rdf and DAML+OIL in doing different automation projects..I find tha teven though I am automating theprocess of having rdf written its stil woudl be easier if the RPV approach was adopted not only for those who hand write rdf but us overworked programmers as well..

  • human readable language
    2002-11-21 21:36:41 Richard H. McCullough [Reply]

    I agree that a human readable language is needed.
    If you think so, take a look at the KR language which was designed to be simple yet powerful, unambiguous and English-like.
    http://rhm.cdepot.net/doc/KEtutorial.txt

  • Human Readable != single-letter names
    2002-11-23 16:48:40 Micah Dubinko [Reply]

    Looks good, though the single-letter names make it harder to read than it could be.


    Where to look for better names? How about <meta> for starters? Hmmm.. cf. XHTML 2.0.


    -m

  • Its growing on me
    2002-11-26 12:23:11 James Ham, Jr. [Reply]

    In all honesty, XML didn't do it for me at all until I discovered RDF nearly a year ago. Via RDF, I no longer have to reinvent the wheel in the early stages of designing an XML app for personal use. RDF makes the underlying data self-descriptive which is more than can be said for bare XML. If Mr. Bray's RPV proposal will make it easier for mere mortals to grasp RDF, then I'm all for it.