XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

RSS: Lightweight Web Syndication

July 17, 2000

RSS is...

RSS is a portal content language. RSS is a lightweight syndication format. A content syndication system. And a metadata syndication framework. In its brief existence, RSS has undergone only one revision, but that hasn't stopped its adoption as one of the most widely used web site XML applications to date. The RSS format's popularity and utility has found it uses in many more scenarios than originally anticipated by its creators.

A Portal Language

RSS v0.9, standing at that time for "RDF Site Summary," was introduced in 1999 by Netscape as a channel description framework for their My Netscape Network (MNN) portal. While the "My" concept itself wasn't anything earth-shattering, Netscape's content-gathering mechanism was rather novel. This simple XML application established a mutually beneficial relationship between Netscape, content providers, and end-users.

By providing a simple snapshot-in-a-document, web site producers acquired audience through the presence of their content on My Netscape. End-users got one-stop-reading, a centralized location into which content from their favorite web sites flowed, rather than just the sanitized streams of content syndicated into most portals. And My Netscape, of course, acquired content for free.

Headline Syndication

A by-product of MNN's work was that RSS could be used as an XML-based lightweight syndication format for headlines, taking them outside of the My Netscape site. RSS-based portals, such as xmlTree, began springing up in various forms, catering to both general subject matter and vertical markets. Carmen's Headline Viewer even broke RSS free of the Web and onto the Windows desktop. RSS quickly became an alternative to ad-hoc syndication systems like that used by Slashdot, and practical in many scenarios where heavyweight standards like XMLNews were overkill. Today's RSS feeds carry an array of content types: news headlines, discussion forums, software announcements, and various bits of proprietary data.

The groundswell of support surrounding RSS lead to its deployment in certain application roles, all taking advantage of RSS in one way or another.

The Registry

Internet Alchemy's OCS format affords RSS providers a way of listing the channels they have available. OCS is employed by the likes of Carmen's Headline Viewer, Apache Jetspeed, Internet.com, and Network54. However, the abundance of RSS (and other formats) feeds brought about the need for registries. xmlTree, a directory of XML content, provides just such a facility: RSS content may be registered and classified by subject, content-type, geography, and language. My.UserLand provides another simple registration facility.

The Aggregator

Along came My.UserLand, an RSS-based portal with a difference: archiving. While MNN displayed only the latest version of a particular channel, UserLand archived snapshots on an hourly basis. The RSS "aggregator" was born. Aggregation brings with it a new concept, the decoupling of items (stories) from their parent channels. Rather than a set of web sites being boiled down into rectangular news-boxes, RSS can be presented as a confluence of feeds from disparate sources with a focus on timeliness rather than channel. While maintaining an item's original association with its channel, Meerkat ("An Open Wire Service") presents items in reverse-chronological order, also allowing filtering, searching, grouping, and sharing.

The Scraper

Moreover, dubbing itself a "webfeed company," specializes in scraping, cleaning, and categorizing online sources and repackaging the data into outgoing syndicated feeds. At the time of this writing, Moreover gathered data from over 1500 sources, grouped into 281 categories. Moreover makes their feeds available in RSS, as well as other XML-based output formats.

The Synthesizer

Decoupling items from channels enables the creation of new content streams, mixing items by topic rather than by parent channel. These might be used for the purposes of incorporation into a web site, further syndication, or commentary. O'Reilly's Meerkat tool, for example, takes in RSS in the form of channels and preserves that relationship in its database, but doesn't restrict outgoing RSS to do the same. Via a simple URL-based mechanism, any user can build a feed from customizable combinations of items, channels, categories, and search results.

Content Syndication

RSS 0.91, re-dubbed "Rich Site Summary," followed shortly on the heals of 0.9. It had dropped its roots in RDF and sported new elements from scriptingNews, a fatter syndication format focused on web writing, where each item is a paragraph containing links and images. This marked a transition toward syndication instead of the metadata "summary" aspect of its predecessor. RSS 0.91's new item-level <description> element brought RSS into the (lightweight) content syndication arena. A 500 character constraint on the description field provides enough room for a blurb or abstract, yet limits RSS's ability to carry deeper content.

The role of RSS as a vehicle for content (as opposed to metadata) syndication is still being hotly debated on and off the syndication mailing list. Opinions fall into three basic camps: a) content syndication support in the RSS core, b) the use of RSS for metadata and scriptingNews for content syndication, and c) the modularization of lightweight content syndication support in RSS.

Metadata Syndication

As RSS continues to be re-purposed, aggregated, and categorized, the need for an enhanced metadata framework grows. Channel- and item-level title and description elements are being overloaded with metadata and HTML. Some are even resorting to inserting unofficial ad-hoc elements (e.g., <category>, <date>, <author>) in an attempt to augment the sparse metadata facilities of RSS.

Discussion forum syndicators are forced to rely upon title-based threading. Aggregators are grappling with the problem of providing information about the original source of an item when removed from its channel context. News syndicators are wondering where to embed a company's stock symbol, currently relegated to <title>(YHOO) Yahoo! announced...</title> silliness.

Solutions to these and future RSS metadata needs have primarily centered around a) the inclusion of more optional metadata elements in the RSS core, b) XML-namespace based modularization, and c) putting the RDF back into RSS. For an overview of the modularization versus core extension discussion, take a look at Leigh Dodds' recent XML-Deviant column, "RSS Modularization."

Finding a Way Forward

So where does that leave us?

RSS is going to have to evolve or die as it gets pulled in different directions. If it can't support the directions required by different developers, it will fade in favor of more special purpose formats. Whether a successful evolution takes the form of a larger flat-file core RSS, or a more comprehensive relational framework, movement is nonetheless needed.

RSS has seen a large degree of adoption from independent content producers, yet has failed to grab the attention of mainstream content providers. Perhaps the high eyeball/effort ratio message just hasn't been delivered. Or is it the "terminal beta" feel of RSS with its < 1.0 versioning that makes anyone but early adopters nervous? The word needs to get out, in executive summaries, and white papers, and adoption by more key mainstream web sites.

RSS also needs more "killer apps," which can be provided (in this author's opinion) by a richer metadata framework within which to build. Scalable extensibility is a must if RSS is to continue being re-purposed. Yet this extensible RSS must remain relatively simple (somewhere between HTML and hard-core RDF should do!) and backward-compatible in a way that will bring the current user-base along, rather than leaving it behind.

Further Reading