XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Why Choose RSS 1.0?

July 23, 2003

Creating New Applications through Extensibility

RSS, a set of lightweight XML syndication technologies primarily used for relaying news headlines, has been adapted to a wide range of uses from sending out web site descriptions to disseminating blogs. This article looks at a new application area for RSS: syndicating tables of contents for serials publications.

Serials newsfeeds -- especially scientific newsfeeds -- differ from regular newsfeeds in that a key requirement for the reader, or more generally for the consumer, of the feed is to be able to cite, or produce a citation for, a given article within the serial. This need for additional information exists across many types of publishing activities. A user may choose not to follow a link directly to some content for whatever good reason, such as preferring to access a locally stored version of the resource. This requires that rich metadata describing the article be distributed along with the article title and link to the article. The need to include metadata within the feed raises the following questions:

  • Which version of RSS best supports the delivery of metadata to users?
  • Which metadata term sets are best employed for supply to users?
  • This article examines both of these issues and then considers how such extensions can actually be used in practice.

    The Many Faces of RSS

    RSS is not a single technology but rather a family of loosely related specifications developed by separate groups. Even the moniker RSS cannot be agreed upon. It is both acronym and non-acronym. As an acronym, it supports multiple interpretations. With its earliest origins way back in 1995 in Apple's Meta Content Framework and the HotSauce application (MCF was subsequently evolved into RDF, the Resource Description Framework), RSS proceeded via Microsoft's Channel Definition Format used by applications such as Pointcast, to reemerge as an RDF profile in Netscape's portal service "My Netscape Network". A revision of this RDF profile was later released as an RSS 0.9 draft. This version of RSS was felt by some to be overly complex and so a compromise was effected and a simpler non-RDF version RSS 0.91 was released.

However, the desire for a truly extensible format with support for namespaces and plugin modules led to the development by the RDF Site Summary Working Group of a new RDF profile in December 2000 with the RSS 1.0 spec [1]. This was immediately challenged by Dave Winer, CEO of Userland Software, with another new non-RDF version RSS 0.92 which was later redesigned and repackaged in September 2002 as RSS 2.0 [2]. (Note that on July 15, 2003, UserLand Software transferred ownership of its hitherto proprietary RSS 2.0 specification to the Berkman Center for Internet & Society at Harvard Law School.)

It is difficult to know what is the underlying cause of all this angst. The principal suspect is surely RDF which is perceived to be somehow "difficult". Although built on a simple triples data model there is no fixed XML serialization since abbreviate XML syntaxes are supported, and it is thus not easy to capture this neatly in an XML schema language. And further, RDF makes liberal use of XML Namespaces. But if RDF is to manage multiple schemas then it manifestly needs to be able to label elements according to their respective schemas. And there is the widely held belief that native XML markup is somehow intuitive -- and by implication good enough -- and that the additional baggage of any common relational data model is just so much further complexity. The problem with this view is that it doesn't scale.

At the time of writing it should be noted that there is a further effort to define yet another new RDF-free specification of RSS with the principal aim of syndicating blogs. Formerly known as "Project Echo", this has been now been tentatively renamed "The Atom Project" [3]. [The curse of RSS nomenclature seemingly continues -- I just checked out the wiki before sending this text off and see that it's now being labeled "The (Not)Atom Project".] The Atom Project is essentially a reworking of RSS 2.0, and as such it remains a much more focused technology than RSS 1.0.

Atom may well lend support for including enriched metadata descriptions via extension modules (as does RSS 2.0 already), although sadly not as RDF. The use of XML Namespaces can help to resolve element naming conflicts in XML documents but cannot of itself resolve any semantic interpretation that may be placed upon the use of a particular schema. Inserting arbitrary namespaced elements into an RSS document does not necessarily help either a human or a machine understand the purpose of the element or the meaning of its value. Further, there may or may not be a schema specification located under the XML Namespace URI, but even if there were, it might not help the human or machine to interpret the context within which an RSS element is found. By adopting a public data model such as RDF all these ambiguities vanish. In the RDF data model the context supplies the meaning.

Thus, for those wishing to create new applications with RSS, it is the extensible RSS 1.0 version with its explicit support for new modules built on a common data model that attracts the real interest. Although in practice, there is not much difference in actual complexity between the various strains of RSS, it has become almost a religious issue which side of the divide one comes down on. Desktop clients generally have a good level of support for most versions of RSS. And, of course, it should be noted that these feeds are typically not generated by hand but rather by applications or library packages leaving the creators free of any real preference other than extensibility.

It is interesting to note that while early incarnations in the RSS lineage focused on general metadata descriptions and newsfeeds, a recent trend has been to realign RSS as a blog syndication technology and thus to limit its descriptive powers. However, the extensibility mechanisms built in to RSS 1.0 ensure that at least this strain of RSS is not tied to any specific set of applications but can be repurposed as required. In this sense, RSS 1.0 provides a reliable framework for future-proofing.

Enriched Metadata Descriptions using RSS 1.0

As regards the metadata term sets, or vocabularies, that can be used, only one RSS 1.0 module for descriptive metadata is currently available -- the Dublin Core (DC) module. DC is a general purpose term set for describing resources and effectively operates as a lingua franca for descriptive metadata on the Web. However, a downside of DC is the lack of any specificity in the core element set. The Dublin Core Metadata Initiative [4] define an extended set of terms for qualifying the basic DC elements, but this is not available as an RSS 1.0 module and may not suit all purposes.

A better match for serials publishers may well be PRISM [5]. Created by a working group of publishers and vendors, the Publishing Requirements for Industry Standard Metadata (or PRISM) defines an XML metadata vocabulary for syndicating, aggregating, post-processing and multi-purposing magazine, news, catalog, book, and mainstream journal content. PRISM provides a framework for the interchange and preservation of content and metadata, a collection of elements to describe that content, and a set of controlled vocabularies listing the values for those elements.

PRISM actually defines a set of vocabularies. Building on the DC term set, it defines a more industry-specific PRISM term set. This is augmented with three further vocabularies to support a simple rights language (PRL), the creation of controlled vocabularies (PCV) and inline markup (PIM). Here, I restrict my attention to the main DC and PRISM vocabularies which suffice to cover the simple metadata terms required for describing serials tables of contents. Although PRISM in proper usage embraces all these vocabularies I use it here in a more limited sense as referring to the term set under the 'prism' namespace, and refer to the PRISM term set under the 'dc' namespace as DC.

While an RSS 1.0 module for DC has long been accepted as a standard RSS 1.0 module, a new proposal for a companion PRISM module has just recently been made on the public rss-dev and www-rdf-interest mailing lists. This comprises some 40+ terms as defined in the PRISM specification.

Example 1 shows an RSS 1.0 feed item enriched with both DC and PRISM metadata. Note that although there may appear to be an overlap of some elements they do indeed serve separate purposes. For example, the <item> element serves for the display title and may need to be truncated, whereas the <dc:title> element contains the full (untruncated) title. Again the <description> element includes title and author string whereas these are carefully broken out into individual <dc:title> and <dc:creator> elements.

Example 1 -- Use of DC and PRISM terms

<rdf:RDF xmlns="http://purl.org/rss/1.0/"

<item rdf:about="http://www.sciencedirect.com/science/

  <title>New 5-(2-ethenylsubstituted)-3(2H)-furanones with
 in vitro antiproliferative activity</title>
  <description>New 5-(2-ethenylsubstituted)-3(2H)-furanones
 with in vitro antiproliferative activity, Pages 5215-5223
 Stefano Chimichi, Marco Boccalini, Barbara Cosimelli,
 Francesco Dall'Acqua and Giampietro Viola</description>

  <dc:title>New 5-(2-ethenylsubstituted)-3(2H)-furanones
 with in vitro antiproliferative activity</dc:title>
  <dc:creator>Stefano Chimichi</dc:creator>
  <dc:creator>Marco Boccalini</dc:creator>
  <dc:creator>Barbara Cosimelli</dc:creator>
  <dc:creator>Francesco Dall'Acqua</dc:creator>
  <dc:creator>Giampietro Viola</dc:creator>




Example 2 shows an RSS feed of the Nature Publishing Group title Nature in a typical desktop application -- here FeedDemon.

Example 2 -- Application screenshot of RSS Feed of Nature
Application screenshot of RSS Feed of 'Nature'

Example 3 shows an HTML page rendered directly from an RSS 1.0 feed of the Elsevier title Tetrahedron by way of an XSLT transform. The interface shown here is purely for demo purposes and shows how the structured metadata terms are ready present in the feed for whatever use case may be defined.

Example 3 -- HTML Rendering of RSS Feed of Tetrahedron
HTML Rendering of RSS Feed of 'Tetrahedron'

Feeds of Feeds

The primary purpose of syndicating tables of contents for serials is to provide a notification service to inform feed subscribers that a new issue has been published. There are, however, secondary uses for such a syndication service -- that is, to provide access to archival issues resident within a feed repository. The hierarchical storage arrangements for archival issues suggest that one possible resource discovery mechanism might be to have feeds of feeds whereby a feed for an archival volume of issues would syndicate the access URIs for the feeds of the respective issues contained within that volume.

This arrangement could even be propagated up the hierarchy whereby a subscription year for a given serial might contain the feed URIs for the volumes within that year, or that a serial feed might contain the feed URIs for the subscription years for that serial.

Another way of using a feed of feeds would be for a publisher to publish an RSS feed of all sites that it wanted to syndicate. As an example of such a feed Nature Publishing Group now has a feed located at < http://nurture.nature.com/rss/rss.rdf> which delivers the access URIs for all its current production feeds. This feed will be updated as new production feeds are made available.


[1] RSS 1.0 Specification, <http://purl.org/rss/1.0/>. (For links to modules, policies and procedures see also the RDF Site Summary 1.0 Specification Working Group at <http://groups.yahoo.com/group/rss-dev/>.)
[2] RSS 2.0 Specification, <http://blogs.law.harvard.edu/tech/rss>. (For links to change notes and modules, see also the Berkman Center page <http://blogs.law.harvard.edu/tech/directory/5/specifications>.)
[3] The Atom Project. See the wiki at <http://www.intertwingly.net/wiki/pie/FrontPage>.
[4] Dublin Core Metadata Initiative, <http://dublincore.org/>.
[5] Publishing Requirements for Industry Standard Metadata, Version 1.2f, First Public Draft, <http://prismstandard.org/spec1.2f.pdf>.

1 to 1 of 1
  1. You _can_ use Qualified DC with RSS 1.0!
    2003-07-28 02:44:05 Chris Croome
1 to 1 of 1