Published on XML.com http://www.xml.com/pub/a/2003/01/02/tr.html
See this if you're having trouble printing code examples


Never Mind the Namespaces: An XSLT RSS Client
By Bob DuCharme
January 02, 2003

RSS is an XML-based format for summarizing and providing links to news stories. If you collect RSS feed URIs from your favorite news sites, you can easily build dynamic, customized collections of news stories. In a recent XML.com article Mark Pilgrim explained the history and formats used for RSS. He also showed a simple Python program that can read RSS files conforming to the three RSS formats still in popular use: 0.91, 1.0, and 2.0. While reading Mark's article I couldn't help but think that it would be really easy to do in XSLT.

Easy, that is, if you're familiar with the XPath local-name() function. In a past column I showed how this function retrieves the part of an element name that identifies it within its namespace. For example, an element with a qualified name of "blue:verse" has the local name "verse" (and not "blue", as I wrote in a typo in that column and only just now caught; "blue" is the namespace prefix).

Typical XSLT stylesheets care a great deal about an element's namespace. If a channel element in an RSS 1.0 file comes from the http://purl.org/rss/1.0/ namespace and a channel element from an RSS 2.0 file comes from the http://purl.org/dc/elements/1.1/ namespace, then an XSLT processor considers these two element types to be as different as a title element from a book publishing namespace and a title element from a human resources namespace. However, by basing match conditions (and, as we'll see later, select tests in xsl:apply-templates instructions) on the local name of source tree elements, we can explicitly tell the XSLT processor to ignore the namespace of certain elements. For example, we can have a template rule that applies to all elements with a local name of "channel," regardless of their namespace.

The following stylesheet mimics the behavior of the rss1.py Python program in Mark's article:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.0">

  <xsl:output method="text"/>

  <xsl:template match="*[local-name()='title']">
    <xsl:text>title: </xsl:text>

  <xsl:template match="*[local-name()='link']">
    <xsl:text>link: </xsl:text>

  <xsl:template match="*[local-name()='description']">
    <xsl:text>description: </xsl:text>

  <xsl:template match="dc:creator">
    <xsl:text>author: </xsl:text>

  <xsl:template match="dc:date">
    <xsl:text>date: </xsl:text>

  <xsl:template match="language"/>  <!-- suppress -->


There is one slight difference: it doesn't print the "date:" and "author:" headers for news items that have no dc:creator or dc:date children. RSS 0.91 doesn't use these two Dublin Core elements. The first template rule in this stylesheet has an asterisk and a predicate inside of square braces to specify that the XSLT engine should apply that rule to any element meeting the predicate condition: its local name is "title." The second and third template rules use a similar format to handle the RSS link and description elements.

I won't show the input and output for this stylesheet: they're essentially the same as the input and output in Mark's article. Instead, I'd rather take the stylesheet a few steps further to create a standalone news aggregator that requires no special software other than a web browser and an XSLT processor.

Three basic XSLT techniques make this possible:

There are plenty of RSS-based news aggregating clients around: Amphetadesk, NewzCrawler, NetNewsWire, among many others. The advantage of using one written in XSLT means that you don't have to install new software on your machine or login to a server-based aggregator that needs to look up a list of your favorite feeds. You can also more easily integrate the XSLT-based one into other applications -- for example, to add customized news feeds to your company's intranet site without relying on any software more expensive or exotic than an XSLT processor.

Our stylesheet will transform the following XML document, which links to summaries of several news feeds and blogs:

<?xml-stylesheet href="getRSS.xsl" type="text/xsl"?>

  <!-- RSS 0.91 feeds -->
  <RSSChannel src="http://www.xml.com/cs/xml/query/q/19"/>
  <RSSChannel src="http://xml.coverpages.org/covernews.xml"/>
  <RSSChannel src="http://www.bbc.co.uk/syndication/feeds/news/ukfs_news/world/rss091.xml"/>

  <!-- RSS 1.0 feeds -->
  <RSSChannel src="http://www.ilrt.bristol.ac.uk/discovery/rdf/resources/rss.rdf"/>
  <RSSChannel src="http://www.smartmobs.com/index.rdf"/>
  <RSSChannel src="http://www.infoworld.com/rss/news.rdf"/>

  <!-- RSS 2.0 feeds -->
  <RSSChannel src="http://www.panix.com/~jbm/snappy/index.xml"/>
  <RSSChannel src="http://www.antipixel.com/blog/index.xml"/>
  <RSSChannel src="http://revjim.net/index.xml"/>


As the document's comments tell us, it includes feeds from the three currently popular RSS formats. For now, most feeds using RSS 2.0 come from webloggers interested in playing with the latest technology, but I'm sure we'll see more commercial sites take advantage of the richer metadata possibilities offered by the post-0.91 releases.

The processing instruction in the document's first line identifies the stylesheet to use for dynamic rendering in a web browser. Before looking at how the stylesheet works, first watch it in action: unzip this file onto your hard disk and use a recent release of Internet Explorer to open RSSChannels.xml. There are a few caveats to remember:

Using IE to open up local copies of RSSChannels.xml and its accompanying getRSS.xsl stylesheet should work fine. A batch file or shell script can also use Xalan or Saxon and these two files to create an HTML file that any web browser can read. So, these caveats won't stand in the way of anyone developing their own XSLT RSS client -- they just get in the way of the flashy demo that I had originally planned.

Let's look at the getRSS.xsl stylesheet.

<!-- getRSS.xsl: retrieve RSS feed(s) and convert to HTML. -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.0">

  <xsl:output method="html"/>

  <xsl:template match="RSSChannels">
    <html><head><title>Today's Headlines</title></head>

p         { font-size: 8pt;
            font-family: arial,helvetica; }

h1        { font-size: 12pt;
            font-family: arial,helvetica; 
            font-weight: bold; }

a:link    { color:blue;
            font-weight: bold;
            text-decoration: none; }

a:visited { font-weight: bold;
            color: darkblue;
            text-decoration: none; }


  <xsl:template match="RSSChannel">
    <xsl:apply-templates select="document(@src)"/>

  <!-- Named template outputs HTML a element with href link and RSS
       description as title to show up in mouseOver message. -->
  <xsl:template name="a-element">
    <xsl:element name="a">
      <xsl:attribute name="href">
        <xsl:apply-templates select="*[local-name()='link']"/>
      <xsl:attribute name="title">
        <xsl:apply-templates select="*[local-name()='description']"/>
      <xsl:value-of select="*[local-name()='title']"/>

  <!-- Output RSS channel name as HTML a link inside of h1 element. -->
  <xsl:template match="*[local-name()='channel']">
    <xsl:element name="h1">
      <xsl:call-template name="a-element"/>
    <!-- Following line for RSS .091 -->
    <xsl:apply-templates select="*[local-name()='item']"/>

  <!-- Output RSS item as HTML a link inside of p element. -->
  <xsl:template match="*[local-name()='item']">
    <xsl:element name="p">
      <xsl:call-template name="a-element"/>
      <xsl:text> </xsl:text>
      <xsl:if test="dc:date"> <!-- Show date if available -->
        <xsl:text>( </xsl:text>
        <xsl:value-of select="dc:date"/>
        <xsl:text>) </xsl:text>

Even with whitespace and comments, the whole thing is less than 80 lines. It has five template rules:

Ill-formed RSS?

One word of caution: as Mark mentioned in his article, not all RSS feeds are well-formed XML, and anything that you load into a source tree for XSLT processing must be well-formed XML. To process ill-formed RSS, you'll have to go beyond XSLT, and Mark will explain some strategies for that in a follow-up piece. In my research, I found very little ill-formed RSS, so this hasn't been a problem for me.

On December 31st I used Saxon to apply this stylesheet to the RSSChannels document shown above and created an HTML result version that you can see here. (Don't forget to try the mouseOvers...) If I applied the same stylesheet to the same XML document at a later date, the result would be different, with more up-to-date news. That's the beauty of RSS.

The actual HTML and CSS that I used create a pretty stark layout. Some simple additions to the stylesheet could add some glitz to the resulting appearance, but despite its visual simplicity, this stylesheet still does a great deal: it retrieves a customized set of news feeds listed in a simple, easily customizable file, and then displays a menu of the news items where you can see their titles, read their descriptions, and then follow the links to the actual stories. You could modify the layout to make it fancier, or you could modify it to make it simpler -- slight modifications will let you convert the RSS to WML, plain text delivery, or some new markup language being developed for new output devices. XSLT helps you grab these RSS feeds; what you do with them is up to you.

Modify the stylesheet to your heart's content and change the URIs in the RSSChannels document as well. You can find a wide choice of feeds to choose from at WebReference.com, Alternative News on the Web, Yahoo's RSS News Aggregators category, and the massive news4sites list. Happy aggregating!

XML.com Copyright © 1998-2006 O'Reilly Media, Inc.