The boom of weblogs has boosted interest in techniques for syndicating news-like material. In response a family of applications, known as aggregators or newsreaders, have been developed. Aggregators or newsreaders consume and display metadata feeds derived from the content. Currently there are two major formats for these data feeds: RSS 1.0 and RSS 2.0. Mark Pilgrim covers these two flavors of RSS in his XML.com article "What is RSS?"
The names are misleading -- the specifications differ not only in version number but also in philosophy and implementation. If you want to syndicate simple news items there is little difference between the formats in terms of capability or implementation requirement. However, if you want to extend into distributing more sophisticated or diverse forms of material, then the differences become more apparent.
The decision over which RSS version to favor really boils down to a single trade-off: syntactic complexity versus descriptive power. RSS 2.0 is extremely easy for humans to read and generate manually. RSS 1.0 isn't quite so easy, as it uses RDF. It is, however, interoperable with other RDF languages and is eminently readable and processible by machines.
This article shows how the RDF foundation of RSS 1.0 helps when you want to extend RSS 1.0 for uses outside of strict news item syndication, and how existing RDF vocabularies can be incorporated into RSS 1.0. It concludes by providing a way to reuse these developments in RSS 2.0 feeds while keeping the formal definitions made with RDF.
RSS 1.0 Terms Have a Formal Definition
RSS 1.0 documents conforms to the RDF/XML Syntax Specification. This means that they are expressed in the language described in RDF Concepts and Abstract Syntax, which has the precise formal semantics defined in RDF Semantics. Unless you're a logician or have masochistic tendencies, you probably won't want to follow the path all the way to the formal base. For most developers the RDF Primer contains plenty to get started. The take-home message is that, unlike with plain XML, which is just syntax, there is well-known meaning that programs can derive from an RDF/XML document.
There is another part of the RDF specification that we need to consider when talking about RSS 1.0: RDF Schema. In the jargon, the RDF Schema specification defines an ontology language. An ontology gives names to concepts and relationships between those concepts. An ontology is really just a tightly controlled vocabulary; to some extent in this context the words "ontology", "vocabulary", and "schema" are interchangeable (in the RSS world, module is often used to refer to essentially the same thing).
RSS 1.0 may be a format defined in human language in the main specification document, but it is also an ontology that is specified in formal language in the RSS 1.0 RDF schema. Consider the RSS 1.0 snippet below.
<title>The Joy of Blogs</title>
The example uses the
<title> terms, and they can be found in the
schema defined like this :
rdfs:comment="An RSS item.">
<rdfs:isDefinedBy rdf:resource="http://purl.org/rss/1.0/" />
rdfs:comment="A descriptive title for the channel.">
<rdfs:isDefinedBy rdf:resource="http://purl.org/rss/1.0/" />
The main things being said here are that item in RSS 1.0 is an RDF class and that title is a property. RDF classes more or less correspond to concepts, and properties are used to describe the relationships between those concepts. So returning to our example, it can be demonstrated that there's more being said in the example than is immediately obvious:
<title>The Joy of Blogs</title>
This says first of all that the resource identified as
is an instance of the class
RDF/XML syntax provides a specific interpretation of the nesting of
the XML, which allows us to determine that the resource has a
and the value of the property is the literal string
"The Joy of
Blogs". This still doesn't
seem to offer much advantage over plain XML. But what we have isn't
just given in terms of human-readable documentation, it's defined with unambiguous definitions throughout, traceable back to
the logical formalism of RDF. These semantics allow us to not only
make statements about the item but to reason programmatically with
In the RDF Schema snippet above, it also says that the
property is a subproperty of the resource
an element defined by the Dublin Core Metadata Initiative. We can
then infer from these statements that the literal
"The Joy of
Blogs" is also related to the item as a
If, for example, a browser-like application were reading the data, but
didn't know how to render
it could reasonably substitute the renderer for
What do we gain from all this formal grounding? If RSS processing alone is our universe, maybe not a lot. But as soon as we want to start integrating our RSS with other RDF data, or merge other data into our RSS, we start to reap rewards.
Extending RSS: Software Releases
As an example of extending RSS, we'll take a software company's product announcement RSS feed. Periodically they release updates to their product, and they would like the announcement of the update to be an automated part of the release process. So when a new release build is made, an item will be inserted into their news feed that contains the product name and the release version.
We create an RSS module by defining the properties we need, explaining their usage and associating them with a unique namespace. On the face of it, this is a trivial exercise -- for the update module we can just define a couple of simple elements:
product- the name of the product. A character string.
version- the version of the release expressed as a string in the format
xis the major version number and
ythe minor version number.
For a namespace we just need a URI, ideally one under our
control. So if we have registered the domain name
supersemantics.com then we could use that as a base.
It's a good idea to recommend a prefix to use for the namespace
within XML documents, and here we shall use
Here's what this might look like in our RSS 1.0 feed.
xmlns:rel="http://supersemantics.com/ns/release/" ... <item rdf:about="http://supersemantics.com/release/2003/06/19#9"> <title>New Release</title> <dc:date>2003-06-19T14:02:33+01:00</dc:date> <rel:product>IronBoard</rel:product> <rel:version>2.3</rel:version> ...
The date in RSS 1.0 is expressed using a W3C Date Time Format DTF (W3CDTF), a profile of the ISO 8601 standard.
By using the RDF document the syntax here we actually says more
than we would with plain XML. The
version elements are actually RDF properties, relating
item resource to literal strings. There are two
statements being made here which can be expressed as
subject (what's being described), predicate (the
property), and object (value of that property):
http://supersemantics.com/release/2003/06/19#9 rel:version "2.3"
The (subject, predicate, object) statement is an important concept in the RDF world and is usually referred to RSS 1.0 RDF Schema as a triple. The subject of one triple may be the object of another and vice versa. This means the triples can also be thought of as a joined-up structure, and that structure is the RDF graph.
So what's the big deal? The relationship between the item
and the product name and version number is already defined. We can
load our RSS file into any RDF aware toolkit (and there are plenty,
see Dave Beckett's
Resource Guide) and have it immediately know that an
item has properties
version. We don't need any more programmer logic to
extend the data model.
If we wish to offer our new module for reuse by others we can,
in the same way that the
properties are defined in the RSS 1.0 RDF Schema, provide a schema
with formal definitions for our terms.
Working with Existing Vocabularies
We noted earlier that the RSS 1.0
was actually a subproperty of Dublin Core's
Some parts of the RSS 1.0 vocabulary such as
dc:creator are used directly from Dublin Core.
Generally speaking it's good practice to use existing vocabularies
directly wherever possible, as it's the best route to
interoperability. A common scenario is that a
general purpose vocabulary contains a term close to what we're
looking for, but our requirement is more specific. The solution
here is to define our own term as a subclass or subproperty of the
existing term (depending whether the term applies to an entity or a
relationship between entities). Thus the
child class (or property) takes on the same characteristics as its
parent, in addition to anything specific to the child.
As it happens, there is at least one existing vocabulary
designed to describe software releases. In fact, the
at eikster.com contains terms that directly correspond to our
version. We can inherit their
descriptions by making our properties subproperties of them.
There is one significant difference between eikster.com's
properties and ours -- their schema provides a
class, to which the properties apply. Looking back at the RSS 1.0
example, we have our
applied to an RSS
item -- the resource on the
left-hand side of the triples is an item, on the right-hand side we
have a string literal. We can use RDF Schema to say we want the
domain (left-hand side) of our properties to be instances of
item and the range (right-hand side) to be literals.
Note that the domain and range are primarily descriptive,
they don't in themselves offer any real constraint as found in WXS. It's up to applications to interpret this as they wish
(true constraints can be added using the Web Ontology Language
A few more things that are easy to add to the schema and are likely to be useful are human-readable labels and comments for each property and references to their definition. Including a reference to the definition might seem a little redundant in part of the definition itself, but the statements in an RDF Schema may be used outside of their original context.
<rdfs:comment>The official name of a software package</rdfs:comment>
<rdfs:subPropertyOf rdf:resource="http://eikster.com/2003/release#name" />
<rdfs:comment>The release version of a software package,
given in major.minor format, e.g. 2.3</rdfs:comment>
<rdfs:subPropertyOf rdf:resource="http://eikster.com/2003/release#version" />
Together with this schema, RDF Schema-aware software that
Release classes will also be
able to understand our RSS
items, as we have defined
how they relate.
Bringing RSS 2.0 to the Party
There are various reasons, substantially matters of personal preference, why some may prefer an RSS 2.0 format. If we can map RSS 2.0 with our extension module unambiguously to the equivalent RSS 1.0 version, then what we have done is to effectively turned the XML syntax into a task-specific serialization of RDF. We can get all the semantic goodness of RDF in the simple XML packaging of RSS 2.0. This is the approach taken by my project, Simple Semantic Resolution (SSR), which is actually defined as an RSS 2.0 module. A step-by-step description, SSR-Enabling an RSS 2.0 Module, is available, but we have already looked at most of these steps already here. What we haven't done yet is defined the mapping. In SSR this is done by supplying an XSLT stylesheet that can carry out transformations of documents using our module in combination with RSS 2.0 into their RSS/RDF counterpart.
A stylesheet is available (thanks to Sjoerd Visscher) that can convert core RSS 2.0 into RSS 1.0, so all we have to do is to do the extra needed to convert our XML elements and contents into RDF properties and objects via a syntactical transformation. Which for our software release module is absolutely nothing. Sjoerd's XSLT passes through unchanged any XML that isn't recognised as RSS, and that's exactly what we want for our syntax.
So all we have to do to give instances of our extended RSS 2.0 the RDF semantics is to use SSR to identify the transform that defines the mapping. All this takes is the insertion of an extra element into the RSS just below the root level, so our enriched RSS 2.0 will look like this:
<ssr:rdf transform="http://ideagraph.net/xmlns/ssr/source/rss2rdf.xsl" />
<pubDate>Sat, 19 Jun 2003 14:02:33 GMT+1</pubDate>
A regular RSS 2.0 client can understand this, as there is no change to the core format.
RSS 1.0's strong point is its use of the RDF model, which enables information to be represented in a consistent fashion. This model is backed by a formal specification which provides well-defined semantics. From this point of view, RSS 1.0 becomes just another vocabulary that uses the framework. In contrast, outside of the relationships between the handful of syndication-specific terms defined in its specification, RSS 2.0 simply doesn't have a model. There's no consistent means of interpreting material from other namespaces that may appear in an RSS 2.0 document. It's a semantic void. But it doesn't have to be that way since it's relatively straightforward to map to the RDF framework and use that model.
The scope of applications is often extended, and depending on how you look at it, it's either enhancement or feature creep. Either way, it usually means diminishing returns -- the greater distance from the core domain you get, the more additional work is required for every new piece of functionality. But if you look at the web as one big application, then we can to get a lot more functionality with only a little more effort.