Raising the Bar on RSS Feed Quality
November 19, 2002
RSS is an XML-based syntax for facilitating the exchange of information in a lightweight fashion through the distribution (or feeding) of resources. Publishers can use this versatile and increasingly essential format to assist end users in tracking and consuming content. Netscape originally developed the format but lost interest and eventually abandoned work on it. This created an identity crisis that devolved into varying interruptions, with dispute over even the meaning of the RSS acronym, RDF Site Summary or Rich Site Summary or Really Simple Syndication. But as divergent efforts work to develop RSS, one result has been a diminished overall quality in RSS feeds.
In this article, I provide an overview of RSS's core syntax, then I examine the poor state of RSS feed quality and provide some recommendations for authoring more useful and effective feeds. This examination is not a review of the RSS specification, nor is it an emphatic plea for strict compliance. Instead, this article provides an approach to authoring RSS feeds that is neutral, practical, and conservative. RSS feeds are simply too useful a mechanism for information exchange services. It is imperative that we improve their effectiveness.
RSS Syntax Basics
Despite differences between RSS 1.0 and RSS 0.9x/2.0, these formats do share a common core of elements. Let's take a quick look at the primary elements of RSS syntax. (While common in all versions, this review does not apply to any specific one element per se. Some of these elements are required, and some are not. It depends on which specification you're referencing.)
It all begins with the
<channel> tag, which contains metadata about the
feed. The primary metadata elements are the
<link> tags, though others have
been specified. Inside of the
<channel> tagset is one or more
<description>, and a
<link> tag. Here is a sample RSS feed.
<rss> <channel> <title>tima thinking outloud.</title> <description>The thoughts of Timothy Appnel on emerging technology and trends.</description> <link>http://tima.mplode.com/</link> <item> <link>http://www.mplode.com/tima/archives/000137.html</link> <title>Released: mt-rssfeed v1.0 and mt-list v0.2</title> <description>I released two new versions of my MovableType plugins today.</description> </item> </channel> </rss>
All the RSS specifications wrap
<channel> with a root tag. I've used
<rss>, but the 0.90 and 1.0 specifications use
<rdf:RDF> as the root tag. At this point, however, they're not
necessary from a functional point of view.
That should give you a taste of RSS syntax. Now let's dive into my recommendations for improving the quality of RSS feeds.
Improving RSS Feed Quality
Over the centuries, we have developed time-tested, best practices for the written word. More recently, through mass media, we have developed information layering techniques. Different RSS formats aside, most aggregators and toolkits make a good effort to abstract information from any feed remotely like RSS. And with the significant adoption of RSS as well as resources like the Syndic8 directory, we can examine usage patterns and make informed assumptions. With some care and consideration we can publish effective, useful, and reliable content feeds with RSS today. Here are my recommendations:
All RSS feeds must be well formed XML.
I wasn't telling the whole truth when I said these were recommendations. This is the only real requirement because RSS is an XML format after all, and well-formedness is XML's baseline. If it's not well formed, it's not XML. Improperly encoded HTML and the use of HTML entities in the RSS feeds cause the most common offenses. (HTML is typically not well formed XML, and XML only supports five named entities that HTML supports.) If you're not sure if a feed is well formed, try an online XML well-formedness checker such as RUWF. While end users may not care about standards compliance, they want their content. It's not that hard to consistently comply with the XML standard. The remaining tips will help.
Use the RSS Validator Service.
It is now not that hard to test your RSS feeds' syntax for errors thanks to the RSS Validator Service. Developed by Mark Pilgrim, along with Sam Ruby and Bill Kearney, it checks RSS feeds for problems and generates friendly and instructive messages for fixing them. The service is optimized for RSS 2.0, but also supports other versions of the format. This recent development is significant because it provides a much-needed tool for alerting publishers to issues in their syndication feeds. (The RSS Validator Service can also tell you if your feed is well-formed XML.)
Use CDATA for embedding HTML in
This is perhaps the most important recommendation I can make because it goes a long way toward avoiding malformed XML/RSS files, with almost no fuss. Avoid the method of entity-encoded HTML, also known as double entity-encoding, which, while quite common and not going away anytime soon, will save you and others some headaches. Besides being a nonstandard practice within the XML specification, this method requires more processing cycles and unnecessarily adds to the file size. It's also prone to occasional error.
Consider this example:
The original HTML is untouched when you use CDATA. And the file-size advantages become increasingly clear as the amount of HTML increases. When you consider that the entity-encoded HTML example above could also be an HTML example that was not encoded, you can begin to see the kinds of errors that entity-encoding can introduce.
Minimize the use of HTML in descriptions.
John Postel's maxim on robust protocols says: "Be conservative in what you do." It's
this same spirit that I make this recommendation. None of the RSS specifications actually
limit what you can embed in a
<description> tag. It would be quite
difficult and controversial if you tried. While feed consumers should be prepared
out unwanted formatting, it's simply good manners for content publishers to help avoid
issues that could break their aggregator or layout. If you do include a hyperlink
image tag be sure to use absolute URLs. RSS is considered a transient document and
to be unable to resolve a relative URL.
Include a descriptive title for each item.
Examine any collection of written thought, such as a magazine, a newspaper, or a book, and you will note how information is organized in layers that can be easily scanned and processed by a reader. A good title, subtitle, or summary (referred to as heads, decks, and leads in media parlance) will not say anything that isn't contained in the main body of the piece. Without scannability content, consumption simply becomes too laborious and time intensive, to the point where most of us would hardly bother reading a thing. Try removing titles from any magazine or newspaper, and you'll appreciate what I'm referring to. Besides being good for scannability, descriptive tiles are good for accessibility. Despite these time-tested, best practices, many feeds fail to include titles, let alone informative ones. Some Webloggers claim it's too time-consuming and difficult to create a title for the numerous (and short) posts they make daily. I can appreciate their viewpoint, but a title, such as the site or the collection name with a timestamp ("tima thinking outloud: November 1, 2002 20:13 -4:00") is more helpful. The end user does not have to guess at a title or the contents by reading some small number of characters or words from the beginning of the description. "Today I saw something that...", for example, doesn't provide any sense of what the blogger has written about that day.
Avoid embedding HTML in the title.
The channel and item titles in RSS, like their counterpart in HTML, are considered metadata and therefore are not expected to have display elements such as HTML tags. Embedding markup, even encoded with CDATA, could break an end user's application with your feed. Keep HTML in the description only, if at all.
Consider writing a meaningful and concise excerpt for the description.
Just as I've recommended including a descriptive title, including a meaningful and descriptive summary improves the scannability, and thereby the utility, of your feeds. It helps readers determine if they want to continue reading, and it communicates the main point of the content for readers who lack time.
If you insist on including the full content of items, offer end users a choice of feeds.
This can be a bit of a controversial issue as some users prefer to have the full content of an item included in the feed so they can read the content in their aggregator. Others prefer concise excerpts that can be quickly scanned or consumed over low bandwidth connections. These viewpoints are neither right nor wrong. It is the content publisher's decision based on the use of their content and the needs of their intended audience. However, publishers would be wise to offer end users a choice. Since most feeds are generated by a tool, this is not difficult to provide. Also consider that end users may only be interested in a particular topic or resource. RSS is highly versatile and can be used to create feeds on a specific topic or on a resource like a calendar of events, mailing list archive, recent comments, or document repository.
Include contact information in your feed.
With vague documentation and varying interpretations of RSS, implementations issues
happen. Publishing an email contact to the person or group responsible for the generation
and management of the feed opens the lines of communication in rectifying any issues.
also provides an avenue for collecting feedback. The RSS 0.9x format uses the
<managingEditor> tags while RSS 1.0
uses Dublin Core metadata in the form of
<dc:publisher> tags. (RSS 2.0 allows for any or all of them.) If you're
developing your own template or toolkit for generating RSS feeds, it's also recommended
you include that information for debugging purposes. An XML comment such as
generator="foo generator/version" --> is sufficient. The RSS 1.0 Administrative
module includes a
<generatorAgent>, if you are so inclined to use it.
Advertise your RSS feeds.
Developing quality feeds is not enough. Make it easy for end users to find these resources. There are a number of ways that you can do this:
Embed an HTML
<link> in your Web page. The prescribed syntax is
<link rel="alternate" type="application/rss+xml" href="feed.rss" title="RSS feed
for My Page">. Most aggregators will search an HTML page to discover feeds
associated to the site or page, allowing end users to subscribe more easily.
Create hyperlinks in your Web pages to RSS feeds. If you have created more then a couple of feeds, consider creating a separate page of links.
Examples of Good Form
If you're a learn-by-example person like I am, and you're curious as to what a feed in the real world looks like, look no further. Here are just a few examples:
In practicing what I preach there is also my feed: tima thinking outloud.
RSS is versatile format for basic syndication and information exchange that can be effectively put to good use today. Content developers and authors should not feel impeded by the identity crisis, the growing pains of the format, or the divergent opinions in the community. These issues will work themselves out eventually. The RSS Validator Service is a great example of the community at work. Use it.
Additionally, I encourage you to focus on developing useful quality feeds that use the common core elements of all three RSS formats, and that you apply proper writing and information-layering techniques.
The future of RSS feeds and its usage, as well as discussions on implementation and the future direction of the format are being discussed by active members in the aggregators, RSS-DEV, and syndication mailing lists. You don't have to be an authoring-tool or toolkit developer to contribute; get involved. The RSS community needs more input from its users for RSS to improve and evolve.