Menu

Processing RSS

April 9, 2003

Ivelin Ivanov

Editor's note: welcome to the first installment of a new regular column on XML.com, Practical XQuery. Ivelin Ivanov and Per Bothner will be bringing us tips on the use of the XQuery language, as well as self-contained example applications.

The goal of this article is to demonstrate the use of XQuery to accomplish a routine, yet interesting task; in particular, to render an HTML page that merges RSS news feeds from two different weblogs. RSS has earned its popularity by allowing people to easily share news among and between web sites. And for almost any programming language used on the Web, there is a good selection of libraries for consuming RSS.

Readers will benefit from a basic knowledge of the XQuery language. Per Bothner has written an informal introduction to XQuery.

Even though XQuery started as an XML-based version of SQL, the language has a very broad application on the Web. In what follows, I will show that XQuery allows RSS feeds to be consumed and processed easily. In fact, we will see that it isn't necessary to use a specialized library. We will utilize only functions of the core language.

Jump Right In

If we were using another language we would have probably started with a breakdown of the components of the script and their individual responsibilities. But the XQuery script is so brief that there is not much to break apart.

I will let the code speak for itself; if you still think you need further analysis, stick around and read the text further below.

Listing 1: XQuery Script -- RSS Feed Merge


define function row ($link, $title)

{

  <div>

    RSS item <b> {$title}</b> is located at <b>

{$link}</b>

  </div>

}



define function filter-rss ($url)

{

 for $b in document($url)/rss/channel/item

 return row($b/link/text(), $b/title/text())

}



<html>

  <body>

  <i>Remote RSS Feed Demo, written in XQuery. 

    Compiled and Run by The Open Source QEXO.org engine.</i>

  <hr/>



 {filter-rss("http://www.javablogs.com/ViewDaysBlogs.jspa?view=rss")}

 {filter-rss("http://radio.weblogs.com/0109827/rss.xml")}



  </body>

</html>

If you want to see the result of this script immediately, visit http://www.cocoonhive.org/xquery/xqueryform.html. It will look similar to the output shown in Listing 2.

Listing 2: XQuery Script Output -- RSS Feed Merge

<html>

  <body>

  <i>Remote RSS Feed Demo, written in XQuery. 

    Compiled and Run by The Open Source QEXO.org engine.</i>

  <hr />



  <div>

    RSS item <b> EJB Design Patterns</b> is located at 

    <b> http://www.javablogs.com/Jump.jspa?id=20692</b>

  </div>

  <div>

    RSS item <b> There is a first for everything</b> is located at 

    <b> http://www.javablogs.com/Jump.jspa?id=20667</b>

  </div>

  <div>

    RSS item <b> </b> is located at 

    <b> http://radio.weblogs.com/0109827/2002/12/11.html#a1219</b>

  </div>

  <div>

    RSS item <b> Programmers are Speshal</b> is located at 

    <b> http://radio.weblogs.com/0109827/2002/12/11.html#a1218</b>

  </div>

  </body>

</html>

Let's examine how the script works. It begins with the definition of two functions. The main body starts after the function definitions.

<html>

  <body>

  <i>Remote RSS Feed Demo, written in XQuery. 

    Compiled and Run by The Open Source QEXO.org engine.</i>

  <hr/>



 {filter-rss("http://www.javablogs.com/ViewDaysBlogs.jspa?view=rss")}

 {filter-rss("http://radio.weblogs.com/0109827/rss.xml")}



  </body>

</html>

As you can see, it is plain html, except for the two lines which enclose calls to the function filter-rss() in curly braces. The curly braces are indication that a XQuery expression needs to be evaluated.

The function filter-rss()is defined by

define function filter-rss ($url) 

{

 for $i in document($url)/rss/channel/item

   return row($i/link/text(), $i/title/text())

}

It loops over all XML nodes matched by the XPath expression "/rss/channel/item", which is applied to the XML document returned by the built-in function document(). This function itself is invoked with the $url argument passed to filter-rss(). The value of this argument is either http://www.javablogs.com/ViewDaysBlogs.jspa?view=rss or http://radio.weblogs.com/0109827/rss.xml.

The content of the XML documents located at these two URLs looks similar to:

<?xml version="1.0" encoding="ISO-8859-1" ?> 

 <rss version="0.91">

 <channel>

  <title>java.blogs Day's Entries</title> 

  <link>http://www.javablogs.com/</link> 

  <description>Blog entries on 14/2/2003</description> 

  <language>en-us</language> 

 <item>

  <title>One thing not to do before a presentation</title> 

  <link>http://www.javablogs.com/Jump.jspa?id=20740</link> 

  <description>Just a helpful hint:...</description> 

 </item>

 <item>

  <title>Reversible</title> 

  <link>http://www.javablogs.com/Jump.jspa?id=20739</link> 

  <description>Links back to pages that link to it. List of referrers 

                and trackbacks. ...</description> 

  </item>

 </channel>

</rss>

As you might expect, the for loop assigns in turn to the variable $i each of the <item> elements of the target document. For each value of $i, the function returns the result of invoking the other custom function in this script, that is, row(), passing the textual values of the link and title sub-elements of item. The latter function is very transparent. It simply returns an HTML <div> element, which contains the textual values of its arguments.

Functional Benefits

I am not aware of another language endorsed by a standards body that can do the same thing more briefly and intuitively. The fact that XQuery recognizes XML nodes as first-class language constructs, combined with the familiar C-like language syntax, makes it an attractive tool for the problems it was built to solve. It must be noted that although it has a for loop structure, XQuery is a purely functional language. In short, this means that XQuery functions always return the same values given the same arguments. This is an important property of the language, which allows advanced compilation optimizations not possible for C or Java.

In the past decade, functional language compilers have shown significant advantages over imperative language compilers. Their unconventional syntax and the inertia of imperative languages keep them under the radar of mainstream development. However, the XQuery team seems to recognize these weaknesses and is making an attempt to overcome them.