XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

RSS Modularization

July 05, 2000

This week, XML-Deviant takes a look at a proposal to develop RSS, the popular web site syndication format, and reports on some discussion regarding potential new applications.

What is RSS?

RSS is an XML format for syndicating metadata about online content. RSS is the format that powers the 'My' portals such as My.Netscape and My.Userland, and content syndicators such as Moreover, Meerkat, and xmlTree.

Perhaps the key benefit of RSS is its simplicity. The syntax for the format is easy to understand, and there are only a few self-explanatory tags to learn. This makes RSS files relatively trivial to produce. Dave Winer of Userland has recently added some new online documentation for RSS 0.91, adding historical notes as well as capturing details of its common usage patterns.

Developers on the RSS and Syndication mailing lists are now discussing future directions for RSS, the hope being to build on current successes and provide richer functionality.

Modular RSS

One way to extend RSS is to modularize the format, allowing additions to be made to the core format in an extensible way. The proposal, which splits the RSS tag-set into a central core and four supporting modules, was originally made by Ian Davis:

My proposal for the advancement of RSS is to split it into a number of modules. There would be a core module consisting of the basic RSS elements... On top of this would be a number of other modules activated through the use of namespaces...

...This way we can make RSS as extensible as we like rather than packing in new features that overcomplicate the spec. Aggregators can ignore the bits they don't need and authors can create simple RSS documents as they always have been able to.

Rael Dornfest later announced that he had produced a visual demonstration of RSS modularization, which neatly illustrates the suggested modular structure. Dornfest has also assembled a Modular RSS Homepage to collate information on the proposal.

The proposal has been generally well received. Stephen Downes observed that modularization is an elegant extension mechanism:

I think it's very elegant. I could name a half dozen modules which should be candidates for inclusion. But what's better - we don't have to agree on these modules, just on how to point to them.

Jonathan Eisenzopf supported the use of Namespaces:

Using namespaces has always been an option. It's a nice option because it doesn't invade on the RSS namespace, as in, you're not adding new elements to RSS, rather, you're including elements from another namespace. A registry of these namespaces would make the proposition even nicer.

Dave Winer preferred a different approach to the development of RSS, wanting to draw content developers into the debate:

I find the activity towards "modularization" to be dry and uninteresting. I'd like to see some new information float through RSS-space, asap. I believe that imperfect element names are fine, I am interested in getting new information to flow from writers to readers.

Winer believed that new module-based features could instead be added directly to RSS, avoiding the complexity of using Namespaces:

You can go the namespaces route, make RSS into a fully buzzword compliant spec, and if you get support from content developers, then we'll probably all read that format as well as earlier formats and we'll have happiness. Or maybe we'll find that the ideas in the modules are what are really important, and not the modularization itself, and then we can have a simple spec, and leave the buzzword people to their gyrations, and keep RSS realllly simple.

Rael Dornfest disagreed, believing that continual revision would leave RSS "bloated":

I regard the work we're doing as laying the foundation for an RSS framework upon which to build. Content developers want to syndicate more than just headlines. We've already seen the requests for RSS support for categorization, aggregation, threading, discussion, job listings, and so on. We just can't anticipate the plethora of uses already in the minds of content developers. Nor can/should we revisit the core RSS specification, repurpose tools and parsers, and go through these gyrations each time a new use is found -- all time better spent writing amazing applications.

RSS, Namespaces, and RDF

Perhaps one of the stumbling blocks for this proposal is its use of Namespaces. Citing a namespace discussion at Userland, Dave Winer observed that it's the busy web developer who needs to be convinced:

If you want to learn about how busy web developers view namespaces, go challenge them, ask them questions, try to explain namespaces and why they are good, but please do it respectfully.

Winer's comments are well placed, highlighting the need to retain the simplicity that has made RSS so successful. XML Namespaces are the approved mechanism for combining multiple vocabularies within a single document. Recent W3C specifications, such as XHTML 1.1, are taking a modular approach to provide future extensibility. RSS may well be a proving ground that demonstrates how well these approaches will be accepted by the wider web developer community.

The relationship of RSS to RDF is another common thread of debate. Both provide metadata about Internet resources; earlier versions of RSS did include some RDF elements. Eric van der Vlist wondered why everyone is so frightened by RDF.:

Why is everyone so frightened by RDF? Looks like (and it's not specific to this list) everyone wants the features and flavors of RDF without the name...

Rael Dornfest commented that this might be one avenue where RDF can demonstrate its power:

I believe, however, that the argument against putting the RDF (back) into RSS is backward-compatibility and simplicity. Now the former can indeed be accomplished *provided* the right tools (read: stylesheets?) are made available. As for the latter, RSS is eminently grokable while RDF _can_ be as clear as mud. Perhaps this is one possible venue for RDF to show itself as powerful yet truly understandable.

Edd Dumbill has also recently published an article commenting on the relationship between RSS and RDF.

It's clear that the RSS seems to be in the unique position of being able to demonstrate the usefulness of both Namespaces and RDF--two very hotly debated specifications. This position is partly due to the enthusiasm of the RSS developer community, and partly because it is facilitating the next generation of web applications.

New Directions

So what are some of the potential applications for RSS being proposed? Here's a brief list of ideas that have been discussed so far:

Search engine syndication

Stephen Tyler has suggested that RSS is well-suited to the task of syndicating search engine results:

To me, the RSS standard seems to be an almost perfect match to the needs of search engine syndication. It is a list of urls with titles and descriptions. The only thing that has really changed is the notion of time ordering, something that is not a part of the spec, but instead only implied by the problems it was designed to solve.

Tyler has already implemented his suggestion, but it will be interesting to see if others pick up on the idea.

In this guise, RSS would provide a standardized format for expressing search engine results. This could allow searches to be posted to multiple search engines and the results amalgamated into a single set of results. De-duplication of results would be relatively simple through comparison of URIs. Transforming the RSS to HTML for display purposes is then a trivial operation.

Discussion group syndication

Eric van der Vlist suggested that an RSS module to describe discussion-based comments about content could lead to a "distributed Slashdot like bidirectional syndication system." In a response to the suggestion, Rael Dornfest demonstrated how this could be achieved using a threading module.

With the proliferation of discussion forums, a mechanism to provide cross-site collating, threading, and syndication of comments would bring the technology to its natural next level of sophistication.

Publish and Subscribe

Currently RSS applications use a "pull" based architecture: A web site aggregating RSS-based content is responsible for retrieving the information directly from its source. An alternative approach is a "push" based system, otherwise known as "publish and subscribe." In this scenario, an aggregator registers an interest with a content provider and is automatically notified when new content is available.

Dave Winer announced his intention to write a specification for publish and subscribe using RSS:

I plan to write a spec for publish-and-subscribe via RSS by adding to the header of a channel enough information for a syndicator to contact a publisher saying "I want to be notified of any changes to this item." I haven't written the spec yet, so this is not a RFC, just a heads-up saying "I'm interested in this too."

Eric van der Vlist observed that there are different levels of subscription associated with a push based system. Aaron Swartz has also published a first draft of an XML-RPC interface for subscribers. Discussion of this idea is continuing. It will be interesting to see whether this system can succeed where previous push-based technology has failed, despite its apparent promise.

As well as looking to the future, it's important to consolidate the current situation. For example, it would appear that while there are numerous web sites offering RSS-based syndication, not every site is taking care to ensure that the RSS files are valid. While tools like RSSLite provide a means to deal with badly formed and invalid RSS, it would be unfortunate if the HTML story repeated itself with RSS. We lose many of the benefits of an XML syndication format if the data is badly encoded.

We as developers need to work alongside the content providers to provide the tools and support required to ensure that RSS content is of the highest quality. If this can be successfully achieved, we can realize more of those "amazing applications."