Sign In/My Account | View Cart  
advertisement


Listen Print Discuss
Should Atom Use RDF?

Should Atom Use RDF?

by Mark Pilgrim
August 20, 2003

Four Independent Issues

Here are four related but completely independent issues:

  1. The RDF model: statements are triples; use graphs not trees

  2. The RDF/XML serialization: a popular syntax for expressing individual RDF documents

  3. RDF tool support: RDFlib for Python, Drive for .NET, etc.

  4. The Semantic Web

And here are four related but completely independent counterarguments:

  1. The RDF conceptual model is overkill for specific applications, or is always overkill, or is simply the wrong model.

  2. The RDF/XML serialization is wretchedly complex and breaks the "view-source" principle for RDF documents.

  3. No RDF tools exist for my favorite language.

  4. The Semantic Web is an unattainable pipe dream, or is too fluidly defined to ever come about, or something.

Related Reading

Practical RDF
By Shelley Powers

The problem with discussing RDF (where that means, "I think this data format should be RDF") is that you can support any four of these RDF issues (model, syntax, tools, vision), in any combination, while vigorously arguing against the others. People who believe that the RDF conceptual model is a good thing may think that the RDF/XML serialization is wretched, or that there are no good RDF tools for their favorite language, or that the Semantic Web is an unattainable pipe dream, or any combination of these things. People who are familiar with robust RDF tools (such as RDFLib for Python) -- and, thus, never have to look at the RDF/XML serialization because their tools hide it from them completely -- may nonetheless think that RDF/XML is wretched. People who defend the RDF/XML syntax may have nothing polite to say about the vision of the Semantic Web. And around and around it goes...

This is a problem with "I think this format should be RDF" discussions. Many people who are thought to be pro-RDF are, in fact, against it in one or more ways (the model is limiting, the syntax is wretched, the tools are buggy or nonexistent, the vision is stupid). And many people who are perceived as anti-RDF are in fact in favor of it in one or more ways (the model is good, the serialization is no more complex than straight XML, the tools work well enough, the Semantic Web is worth the wait).

For the record, I think that the RDF model is sound, the tools work for me, the serialization is wretched, and the Semantic Web is an unattainable pipe dream. If I appear to be wavering over time, sometimes pro-RDF, sometimes anti-RDF, it may be that I'm simply arguing different facets.

RDF and Atom

Why do I bring this up? Because, as it happens, the Atom project is creating a new format for syndicating content and an API for a new web service. For the past week and a half it has been completely engulfed in an all-out flame war over whether it should use RDF. The discussion has been almost entirely unproductive: this question is really four questions, corresponding to the four issues:

  1. Can Atom benefit from the RDF conceptual model?
  2. Should Atom feeds use the RDF/XML syntax directly?
  3. Can I use RDF tools to consume Atom feeds?
  4. Is Atom part of the Semantic Web?

My answers? Yes, no, it depends, and I don't care.

A Wise Teacher

I sat in on an IRC chat with Sam Ruby, Shelley Powers, Sean Palmer, Joe Gregorio, and others who have contributed heavily to Atom over the past few months. About half of these people are traditionally considered pro-RDF, half anti-RDF; but as you've seen, these simplistic labels are really just another a source of confusion, so I won't tell you which person is which. The focus of the chat was to come up with an RDF serialization of Atom by taking the examples from the Atom 0.2 snapshot (which are straight XML) and creating an XSLT transformation into RDF.

During the course of this chat, all of the four issues (model, syntax, tools, vision) came up. As you might imagine, some were more constructive than others. The model was really the most constructive, in that it taught us two key things:

  1. Cardinality is vitally important to figure out up front, and the RDF model forces you to figure it out up front. This is a good thing. For example, an Atom <feed> can contain one or more <entry> elements. If you had a feed with one element, it would look like this in XML:

    <feed version="0.2" xmlns="http://purl.org/atom/ns#">
      <!-- some feed-level metadata omitted for brevity -->
      <entry>
        <title>Atom 0.2 snapshot</title>
        <link>http://diveintomark.org/2003/08/05/atom02</link>
        <id>tag:diveintomark.org,2003:3.2397</id>
        <issued>2003-08-05T08:29:29-04:00</issued>
        <modified>2003-08-05T18:30:02Z</modified>
        <summary>The Atom 0.2 snapshot is out.  Here are some sample feeds.</summary>
      </entry>
    </feed>

    Now suppose you wanted to add a second entry. You just add a second <entry> element:

    <feed version="0.2" xmlns="http://purl.org/atom/ns#">
      <!-- ... -->
      <entry>
        <title>Atom 0.2 snapshot</title>
        <link>http://diveintomark.org/2003/08/05/atom02</link>
        <!-- ... -->
      </entry>
      <entry>
        <title>Atom API primer</title>
        <!-- ... -->
      </entry>
    </feed>

    In other words, straight XML doesn't force you to think about cardinality until it's too late. If you looked at the first example (with only one entry) and said "Aha! A feed has an entry in it!" and went off to write code based on that assumption, you'd be borked when your code hit the second example (with two entries).

    But in RDF, collections of things are always explicit, so a feed with one entry would look like this:

    <rdf:RDF
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:atom="http://purl.org/atom/ns#"
      xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:dcterms="http://purl.org/dc/terms/">
    <atom:Feed rdf:about="tag:diveintomark.org,2003:3">
      <!-- ... -->
      <atom:entries rdf:parseType="Collection">
        <atom:Entry rdf:about="tag:diveintomark.org,2003:3.2397">
          <dc:title>Atom 0.2 snapshot</dc:title>
          <atom:link rdf:resource="http://diveintomark.org/2003/08/05/atom02"/>
          <dcterms:issued>2003-08-05T08:29:29-04:00</dcterms:issued>
          <dcterms:modified>2003-08-05T18:30:02Z</dcterms:modified>
          <dcterms:created>2003-08-05T12:29:29Z</dcterms:created>
          <dc:description>The Atom 0.2 snapshot is out.  Here are some sample feeds.</dc:description>
        </atom:Entry>
      </atom:entries>
    </atom:Feed>
    </rdf:RDF>

    See the difference? Entries are always wrapped in an <entries rdf:parseType="Collection"> container element. If there's one entry, you get a collection of one; if there are two entries, you get a collection of two. But you know up front that it's a collection.

  2. The other big thing that the RDF model forced us to clarify was the concept of ordering. In XML entries within a feed are in a particular order. Is that order accidental or intentional? This, honestly, is not something we'd given any thought to. The primary use-case for syndicated feeds is that the client parses a number of feeds from different sources and puts all the entries in chronological (or reverse chronological) order. Each entry has a required <modified> date for this purpose, so the issue of the structural order of entries within an individual feed wasn't a big concern.

    However, RDF forces it to be a concern because there are different container types for ordered and unordered lists. Once again the rigorous RDF model forced us to consider this up front, exposing an ambiguity in our current specification. The process of converting Atom-XML into Atom-RDF forced us to clarify these issues in our conceptual model.

So is the RDF model a good thing? I think that it is; considering it made our format better, regardless of the syntax.

But the Syntax...

However, as you can see from the above snippets and the full final Atom-RDF prototype, the RDF/XML syntax is far more complex than the equivalent just-XML version. (Depending on your browser, you may need to view the source of either or both of those examples.)

Part of the problem stems from the very thing that RDF is supposed to be good at, namely, reusing and combining ontologies in a single document. You see, we kind of cheated when we created Atom-XML. The specification defines a number of elements (such as <title>) in terms of Dublin Core, but when you look at the actual Atom-XML document, you can see that we really redefined them in the Atom namespace. As a result, the XML version looks simpler as first glance because all the elements are in a single namespace which is defined as the default namespace.

In theory, you could cheat in this same way in RDF and put everything in a single namespace. But then you've pretty much negated one of the main benefits of RDF because you've redefined parts of existing ontologies and made it harder for people to integrate your RDF documents with other RDF data. Now they'll need to transform or map all your redefined elements back to their original ontologies. Since we were creating an XSLT transformation and could make the RDF look like whatever we wanted, we all agreed that we should do the right thing and reuse existing ontologies as much as possible. (This was actually the bulk of the discussion time, bickering about which ontologies to use.)

This highlights the crux of the perennial flame wars about RDF/XML: it can almost be as simple as pure XML. In fact with a few DTD tricks to default the parseType attributes, it can look virtually identical, but only if you cheat and redefine everything in your own ontology and force everyone else to map it back to other ontologies later. Or you can do the right thing and reuse existing ontologies from the beginning and then the syntax gets hellishly complex. There's always an additional cost; you can put it wherever you want, but you can't get rid of it.

So should Atom use the RDF/XML syntax directly? I vote "NO".

The best of both worlds

RDF (the model) is a good thing; RDF (the syntax) is a bad thing. "But," I hear you cry, "I don't care about the syntax because I have good RDF tools!" How can we allow you to use your RDF tools on Atom, and do the right thing with reusing existing ontologies, and keep the syntax simple for people who simply want to parse Atom feeds in isolation, as XML?

We can make the XSLT transformation normative. Here it is, the result of a 4-hour IRC chat. We should include it in the specification, maintain it as the format changes, and mandate that it is the One True Way to use Atom syndicated feeds as RDF.

Is this more work for the RDF folk? Sure. Now they need an XSLT parser as well as their favorite RDF tool. But every platform that has robust RDF tools (a small but growing number) also has robust XSLT tools.

But Atom-as-RDF is not the primary mode of consuming Atom feeds. There are dozens, perhaps more than 100, tools that consume syndication feeds now. Some of them have already been updated to consume Atom feeds and the format hasn't even been finalized yet. Most will be updated once the format is stable. And, to my knowledge, only one (NewsMonster) handles them as RDF, and it already has the infrastructure to transform XML because it does this for six of the seven formats called "RSS" (the seventh is already RDF).

In other words, we're hedging our bets. Whether a vocal minority likes it or not, RDF is very much a minority camp right now. It has a lot to offer -- I saw that first-hand as it forced us to clarify our model -- but it hasn't hit the mainstream yet. On the other hand, it seems perpetually poised to spring into the mainstream. Tool support is obviously critical here (since they help hide the wretched syntax), and the tools are definitely maturing.

So should Atom be consumed as RDF? It depends. If you want to, and have the right tools, you can. You'll need to transform it into RDF first, but we'll provide a normative way to do that. If you don't want to, then you don't have to worry about it. Atom is XML.

What About the Semantic Web?

I don't care about the Semantic Web. Next question?


Comment on this articleWhat do you think about using RDF in Atom? Share your opinion in our forum.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • I believe
    2009-10-11 23:34:35 Har45 [Reply]

    I believe that support for multiple platforms is more bad than good. Tools underdeveloped
    my healthcare (http://www.trustcanadianhealthcare.com/) and blog (http://www.trustcanadianhealthcare.com/medblog/)

  • Carpet Cleaning Los Angeles 1-323-678-2704
    2008-09-26 18:12:43 0 [Reply]

    Carpet Cleaning Los Angeles 1-323-678-2704
    Our mission is to provide the very best carpet and upholstery cleaning services call 1-818-386-1022 to residential and commercial clients throughout Los Angeles, San Fernando Valley, CA . Clean Health Carpet Care is dedicated to meeting the needs of our clients through innovative cleaning technologies, 5-star customer service and industry expertise. Our certified technicians specialize in deep cleaning your carpets and rugs, leaving them looking great and germ free. Free Pick Up and Delivery of area rugs is available upon request. Cleaning of your area rugs can be done in our specialized facility, at your home or in your location. Our specialties: Spot and Stain Removal Pet Stain and Odor Removal Wall to Wall Carpets Schotchguard © Sanitizing and Disinfecting Oriental Rug Cleaning Specialty and Delicate Rugs Water Damage and Restoration. Upholstered furniture has a more varied range of materials and manufacturing methods than carpet. Our knowledgeable technicians can identify the fabric type of each upholstered piece and what cleaning methods will give your upholstery the best, safest and longest lasting results. Sofas Recliners Dinning chairs Love Seats Arm Chairs Leather furniture professional technicians are specially trained in the care of all fabric types, even the most delicate, so you can feel comfortable in knowing they’ll choose the proper upholstery cleaning solution for your furniture. And using specially designed tools to gently clean folds and crevices, they’ll ensure the entire piece is entirely clean

  • Recent developments
    2006-01-10 03:44:15 Danny Ayers [Reply]

    Worth repeating: Atom format is now RFC 4287!


    Work on expressing Atom data in RDF (with OWL) continues, see:


    http://atomowl.org/

  • Schema languages as a wise teacher instead
    2003-08-24 05:08:01 Martijn Faassen [Reply]

    In the section 'A Wise Teacher' it is claimed
    thinking in terms of the RDF model is useful as
    it helps one think about cardinality and order. Doesn't an XML schema language help you
    think about the same thing? If you're going to
    write a schema for your XML format you will
    have to capture cardinality and order aspects in
    it. So if these examples are the main benefits
    of considering the RDF model then I don't really
    see the case for using RDF as opposed to schema
    here.


    Of course there's a case for reusing RDF tools.
    There's also probably a case to think in terms
    of the RDF model, which I don't know much
    about. It's just that the supplied examples
    didn't really work for me -- writing a schema
    seems to be a more natural way to be made to
    consider such issues.


    The nicest part of the article to me is the analysis of the 'related but completely independent' (sic) issues. I'm "interesting" on the model, "ugh" on the serialization, "would like to play with it using some Python tool one day" on tools and "small directly useful steps are good
    and we'll see where they'll lead us" on the semantic web.


    Martijn





  • Closer than you think
    2003-08-23 01:45:13 Carl Garland [Reply]

    I think as Avdi states you are actually one of the leading advocates of the Semantic Web and as the
    other xml.com article states the Semantic Web is Closer than anyone realize from starting to appear. As I stated over at my lame blog I often think of the Semantic Web the same way as CSS. While initially buggy and underimplemented I think a large portion of tommorrow's WWW will be SW enabled largely through a few applications that will make their way into the mainstream and I think Atom will be the highway that many of the tools use.

  • Semantic Web
    2003-08-22 13:26:46 Avdi Grimm [Reply]

    I find it amusing to read these protestations of "I don't care about the semantic web!" from people working on things like Atom. The fact is, if you are working on Atom you *do* care, because Atom is as much the semantic web as anything else. Atom is about exposing semantic information on the web for consumption by software tools. That is, by definition, the Semantic Web; claiming otherwise is just wishful thinking.


    Ironically, with his work on making the web more accessable to those with disabilities, along with his recent work on Atom, Mark is currently one of the leading advocates of the real Semantic Web. It's a shame he refuses to acknowledge that fact.

  • RDF "debate.".....Paradigm Lost
    2003-08-21 15:38:35 Wayne Yuhasz [Reply]

    Nice aritcle and good distinction made between different aspects of the technology , the model's character and larger view of it's contribution to the web's fundamental value (inclusive or restrictive)

  • Incorrect Assumptions
    2003-08-21 11:40:33 Shelley Powers [Reply]

    Mark has made several incorrect assumptions. I've addressed these externally


    http://weblog.burningbird.net/fires/001550.htm

  • Halfway...
    2003-08-21 09:00:35 Danny Ayers [Reply]

    ...to being convincing.


    The use of a single namespace for Atom in RDF isn't really the big deal you suggest, so the syntax comparison you make isn't at all balanced. It simply isn't comparing like with like.

    It's easy enough to express dc:title owl:isEquivalentTo atom:title or whatever in a schema, well out of the way of the feeds themselves. Tools smart enough to grok the RDF model shouldn't have much trouble grokking the equivalence. Simpler syntax, no extra work for the average developer.


    The current web is a pipe dream. Saying that doesn't affect the constructive work anyone's doing.

  • Me Too!
    2003-08-21 01:35:30 Julian Bond [Reply]

    - RDF is actually quite neat. I just wish I could understand what they're talking about half the time.
    - RDF/XML serialization sucks.
    - There's not enough mature tools and too many platforms are badly or buggily supported.
    - Semantic what?

  • Namespace Cheating...
    2003-08-21 00:09:13 Nasseam Elkarra [Reply]

    Why not use dcterms:modified instead of modified? It makes no sense to put an element in the Atom namespace if it is already defined in another people are familiar with.


    It is funny how people want to make the syntax as simple as possible when at the end there will be tools to abstract it all away. I don't care about a simple syntax, I want what is right. If that means making Atom a little more complex and based on RDF and Dublin Core, so be it.


    XML is great. One of the biggest problems people find with XML is that when it was being passed around, people are inconsistently naming and structuring data. So specifications and recommendations have been released in hopes that people would start using consistent terms. RDF and Dublin Core come along and do just that and then people complain that it is complex. Of course it is going to be complex! We are talking about trying to build consistent vocabularies and terms here!


    I recently switched over one of my projects to RDF/DC and I am really pleased with the consistency it gives the metadata. I am still working out the quirks to take full advantage of RDF but once I run my documents through the RDF validator, the advantages become clear: consistent terms and lots of tools.


    My final piece of advice is: don't cheat. Some XML parsers cheat and actually allow non-XML behavior to sneak in. We cannot risk cheating in implementing XML as well. Namespaces help us define our terms so let us take advantage of this feature. Let us not be scared of complexity. Chances are when a developer talks about complexity they mean the secretary-at-my work-won't-understand kind of complexity which in fact isn't complex for real developers. If it is going to take complexity to do something right then don't worry, your average Joe Blogger won't be writing his Atom in Notepad, he will be using a blogging tool.


    -Nasseam
    http://www.myspotter.com
    http://www.opensec.org

  • On Tools...
    2003-08-20 22:15:54 Micah Dubinko [Reply]

    At the point where you can only effectively work with a data format using tools, you have already lost.


    And for the record, like Mark I think that the RDF model is sound, the serialization is wretched, and the Semantic Web is an unattainable pipe dream. -m