Menu

ROME in a Day: Parse and Publish Feeds in Java

February 22, 2006

Mark Woodman

Ready to parse and publish RSS and Atom feeds in Java? In this step-by-step tutorial, we'll show you how to pull in an existing feed, add your own content, and publish the results in a new format, all in 100 lines of code. (200 lines with whitespace and comments.)

Knowing that RSS and Atom feeds are "just" XML, you might think that parsing and creating syndicated feeds in Java should be a snap. Pick any one type of RSS, and you might be right. Unfortunately, there are at least ten flavors of RSS and Atom out there: RSS 0.90, RSS 0.91 Netscape, RSS 0.91 Userland, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, and the newest addition to the bunch, Atom 1.0. Then there are all the namespace modules, like Dublin Core, Media, and so on. It's all messy enough to make a grown programmer cry. Wipe those tears, Java developers, and say hello to ROME.

When in ROME

ROME Logo In this tutorial, we'll be using ROME to do all the heavy lifting. ROME is an open source (Apache licensed) Java library which is designed to make it easy for you to parse and create syndicated feeds, regardless of format. In fact, all of the variants of RSS and Atom mentioned earlier are supported by ROME.

ROME doesn't just come with features, it also has a proven track record on sites like My AOL, CNET Networks, and Edmunds.com. The Powered By ROME wiki page describes how ROME is being used in these and other applications.

The basic approach of ROME is to parse any RSS or Atom feed item into a canonical bean interface. This lets you as a developer manage fairly homogeneous item beans regardless of their original format. Even better, ROME makes it easy to create a new RSS or Atom feed, using those very same beans. This tutorial is going to show you how to do just that.

Warming Up

To illustrate how to use ROME, we are going to mimic some features made popular by FeedBurner, a site which provides feed hosting and statistics for RSS and Atom publishers. FeedBurner itself doesn't use ROME (as far as I know), so we are going to mimic their end product, not their process.

FeedBurner offers a service called FeedFlare, by which publishers can add a contextual footer to each item in their RSS or Atom feed. (This is a great example of the Immediate Action pattern.) The links in the FeedFlare footer are built using data from the feed items, and allow the reader to easily email a link, bookmark the item in del.icio.us, and so on. Figure 1 shows a FeedFlare footer as displayed in NewsAlloy:

Figure 1
Figure 1. FeedBurner adds a FeedFlare footer to an RSS item. Click image for full-size screenshot.

To demonstrate how easy it is to use ROME, this tutorial will show you how to play the part of an--ahem--FeedWarmer. You will pull in any RSS x.x or Atom x.x feed of your choosing, read key information from each feed item, add an interactive footer, and then republish the results in a new format.

(If you have ever looked at the XML differences between RSS 1.0, RSS 2.0, and Atom 1.0, you'll realize that changing feed formats is no small feat. ROME, thankfully, handles all the hard work.)

Key Ingredients

Before you can jump into coding, you will need to download a few things. Here is the list of ingredients you will need for today's recipe:

Now that you have downloaded the above resources and have your development environment set up, let's get into the code.

Creating the FeedWarmer

ROME provides a series of bean interfaces which can be used to access the data of a syndicated feed, regardless of format. As we write the FeedWarmer class, we'll make use of the SyndFeed and SyndEntry interfaces to keep our code as clean as possible. To understand the broader scope of the classes we use, be sure to keep the ROME Javadocs close by.

Let's start with the basic structure of the FeedWarmer class. We'll build on this structure as we go through the tutorial, but you can also use the complete FeedWarmer.java source code as a reference.

public class FeedWarmer

{

    public FeedWarmer() 

    {}



    public String warmFeed(URL url, String outFormat)

            throws IOException, FeedException 

    {}

    

    private void addFooter(SyndEntry entry)

    {}



    private String createFooter(String original, String link,

                                String title)

    {}



    public static void main(String[] args) throws Exception

    {}

}

As shown above, the structure of our program is fairly simple. We'll fill out the details as we go, following this basic approach:

  1. We will create a default constructor, FeedWarmer(), to initialize any instance variables.
  2. The primary method used in this class is warmFeed(), which takes an RSS or Atom feed URL and a desired output format. It is the control point to parse a feed into a SyndFeed bean, modify the SyndEntry beans representing each feed item, and return the results in a new format.
  3. The addFooter() method will be used to add an interactive footer to each SyndEntry bean. These beans hold the data from a syndicated feed, and represent either an RSS item or an Atom entry.
  4. createFooter() is a utility method which uses the arguments passed in to create a footer. It appends the footer to the original feed HTML and returns the result.
  5. Last but not least, we'll use a standard main() method to try out our work.

Instance Variables and Constructor

Let's begin by setting up some instance variables and the constructor. The ROME library has utility classes for parsing and publishing syndicated feeds, so our FeedWarmer will need an instance of SyndFeedInput and SyndFeedOutput. We will also be using a ROME module to handle <content:encoded /> XML elements. Modules are invoked using their target schema, so we'll also add a URI for purl.org's content schema:

public class FeedWarmer

{

    /** Namespace URI for content:encoded elements */

    private static String CONTENT_NS =

            "http://purl.org/rss/1.0/modules/content/";



    /** Parses RSS or Atom to instantiate a SyndFeed. */

    private SyndFeedInput input;



    /** Transforms SyndFeed to RSS or Atom XML. */

    private SyndFeedOutput output;

    

    /**

     * Default constructor.

     */

    public FeedWarmer()

    {

        input = new SyndFeedInput();

        output = new SyndFeedOutput();

    }

The warmFeed() Method

This is the heart of the FeedWarmer program. The inputs to this method are the URL of the feed we wish to "warm," and a String indicating the desired output format.

Our first order of business is to use our instance of the SyndFeedInput to build a SyndFeed bean from an input stream of the syndicated feed. The ROME library provides an XmlReader class to fetch the feed over HTTP, determine the character encoding, and provide the input stream we need.

/**

 * Add FeedWarmer footer to all items of any feed,

 * then republish as format specified in OUTPUT.

 * @param url           The feed URL to input

 * @param outFormat     The feed type to output.  Can be:

 *                      rss_0.9, rss_0.91, rss_0.92, rss_0.93,

 *                      rss_0.94, rss_1.0, rss_2.0, atom_0.3,

 *                      or atom_1.0

 * @throws IOException

 * @throws FeedException

 */

public String warmFeed(URL url, String outFormat)

        throws IOException, FeedException

{

    // Load the feed, regardless of RSS or Atom type

    SyndFeed feed = input.build(new XmlReader(url));

Now that we have a populated SyndFeed bean, we can set the feed type of the output format we desire. This does not effect the structure or contents of the bean itself, but it will instruct the SyndFeedOutput on the desired XML output at the end of the method.

    // Set the output format of the feed

    feed.setFeedType(outFormat);

This is where the simplicity of ROME becomes evident. You don't need to use XPath or understand the structure of the feed, because you have the SyndFeed bean interface. Let's show that this feed has gone through FeedWarmer by altering the feed's title with a getter and setter:

    // Modify the feed title

    String newTitle = feed.getTitle() + " (Warmed)";

    feed.setTitle(newTitle);

Moving beyond the trivial, we need to grab and modify each RSS item (or Atom entry) and add a footer to it. The SyndFeed bean has a getEntries() method for just that purpose. We'll use that List of SyndEntry beans to get the job done. We haven't fleshed out the addFooter() method yet, but we'll put in the call here anyway:

    // Iterate through feed items, adding a footer each item

    Iterator entryIter = feed.getEntries().iterator();

    while (entryIter.hasNext())

    {

        SyndEntry entry = (SyndEntry) entryIter.next();

        addFooter(entry);

    }

To finish up this method, we need to output the modified feed in the format passed in as an argument. This is a job for SyndFeedOutput, which takes a SyndFeed and a Writer. We'll use a StringWriter since the method signature for warmFeed() has us returning a String:

    // Generate XML in output format, regardless of original

    StringWriter writer = new StringWriter();

    output.output(feed, writer);

    return writer.toString();

}

Now that we have the basic control structure of the FeedWarmer, we need to write the internals of the addFooter() method we referenced.


The addFooter() Method

This method is handed a SyndEntry instance, which represents an RSS item or Atom entry. Our main task in this method is to determine the best place to add our HTML footer. Many feeds have escaped HTML in their <description /> elements, and many others put HTML in a CDATA block in a <content:encoded /> element. The latter is certainly preferable, so we should look for those first and fall back to escaped HTML in <description /> only as a last resort.

If the above makes you cringe, you aren't alone. The dirty secret of syndicated feeds is that XML best practices often have been flushed right down the toilet. But that isn't our problem to solve in this program, so let's move on.

As mentioned earlier, we're going to use ROME's ContentModule to discover whether there are <content:encoded /> elements we can use. This namespace isn't actually part of any RSS or Atom specification, but was rather added later to address the misery of escaped HTML. ROME uses modules to implement the specifics of such add-ons.

As we dig into this method, we'll need to grab the value of the SyndEntry's title and link. These values will be used in createFooter() to build the HTML provided by our FeedWarmer.

/**

 * Add FeedWarmer footer to an entry.

 * @param entry

 */

private void addFooter(SyndEntry entry)

{

    // Prep variables used in loops

    String title = entry.getTitle();

    String link = entry.getLink();



    // Use the add-on ContentModule to handle

    // <content:encoded/> elments within the feed

    ContentModule module =

            ((ContentModule) entry.getModule(CONTENT_NS));

We've asked for the ContentModule by referencing the namespace stored as a constant on the class itself. If the module isn't null, that means the entry does in fact have one or more <content:encoded /> elements. We can use that module to grab the encoded strings from the SyndEntry, add a footer to each, and set the "warmed" encoded strings back on the SyndEntry:

    // If content:encoded is found, use that.

    if(module!=null)

    {

        // Container for footer-appended HTML strings

        List newStringList = new ArrayList();



        // Iterate through encoded HTML, creating footers

        Iterator oldStringIter =

                module.getEncodeds().iterator();

        while (oldStringIter.hasNext())

        {

            String original = (String) oldStringIter.next();

            newStringList.add(createFooter(original,

                              link, title));

        }



        // Set new encoded HTML strings on entry

        module.setEncodeds(newStringList);

    }

If we don't have a module, it means that only a description or its semantic equivalent is available on the bean. Since there may be more than one, we'll iterate through each and add our HTML footer. ROME will take care of escaping the HTML, if that is necessary, upon output.

    else

    {

        // Fall back to adding footer in description

        // This results in escaped HTML.  Ugly, but common.

        Iterator contentIter = entry.getContents().iterator();

        while (contentIter.hasNext())

        {

            // Target the description node

            SyndContent content =

                    (SyndContent) contentIter.next();



            // Create and set a footer-appended description

            String original = content.getValue();

            content.setValue(createFooter(original,

                             link, title));

        }

    }

}

That wraps up addFooter(). We now need to code up the createFooter() method, which is the last real work left.


The createFooter() Method

We have used ROME to do all the hard stuff, so all that remains is to create the HTML footer added to each SyndEntry. To mimic some features provided by the inspiration of FeedWarmer, we'll add links for email, del.icio.us, and a Google Blog Search. None of this code is ROME-specific, so it should be fairly self-explanatory.

A word of caution: don't cut and paste the following section into your Java editor. The code shown has escaped HTML so that it will display properly in your web browser. Instead, use the actual source code in FeedWarmer.java, which contains unescaped HTML.

/**

 * Create a feed item footer of immediate actions

 * by using information from the feed item itself

 * @param original  The original text of the feed item

 * @param link      The link for the feed item

 * @param title     The title of the feed item

 * @return

 */

private String createFooter(String original, String link,

                            String title)

{

    // Use StringBuffer to create a sb

    StringBuffer sb = new StringBuffer(original);

    sb.append("\n\n<div class='feedwarmer'><hr/>");

    sb.append("<i>Getting Warmer:</i> ");



    // Add email link using title and item link

    sb.append("<a href='mailto:?body=Check this out: ");

    sb.append(link).append("'>Email this</a> | ");



    // Add delicious link using item title link

    sb.append("<a href='http://del.icio.us/post/?url=");

    sb.append(link).append("&title=").append(title);

    sb.append("'>Add to delicious</a> | ");



    // Add Google Blogs Search link using item title

    sb.append("<a href='http://blogsearch.google.com/");

    sb.append("blogsearch?hl=en&q=").append(title);

    sb.append("'>Blog Search this</a>");



    // Finish and return the sb

    sb.append("</div>\n");

    return sb.toString();

}

These are just a few simple examples of what you can do with just the item's title and link. Once you realize how easy it is to get at a feed's data with ROME, a lot of opportunities open up to leverage that data when republishing the feed.

The main() Method

We're in the home stretch. Let's put together a simple main() method which allows us to test the code from the command line. The first argument passed in will be the URL of any feed you like. This version writes the results out as RSS 2.0 to the console, but you could just as easily write them out to a file or other output stream.

   /**

     * Main method to demo from command line.

     * @param args        args[0] must be the URL of a feed

     * @throws Exception

     */

    public static void main(String[] args) throws Exception

    {

        // Create instance

        FeedWarmer warmer = new FeedWarmer();



        // "Warm" a feed using URL passed in,

        // designating the feed output desired

        String warmedFeed = warmer.warmFeed(new URL(args[0]),

                                            "rss_2.0");

        // Print to console to demo results

        System.out.println(warmedFeed);

    }

}

Try It Out

Compile FeedWarmer.java and take your new creation for a spin. The output XML should now have a footer added to each feed item or entry, as shown in Figure 2:

Figure 2
Figure 2. Our FeedWarmer class has added a footer to an RSS item. Click image for full-size screenshot.

Here is a sample of XML that has been run through the FeedWarmer. Note that the <description /> element is untouched, but the <content:encoded /> element has our footer at the end:

<div id='feedwarmer'><hr/>

   <i>Getting Warmer:</i>

   

   <a href='mailto:?subject=ROME 0.8 Released&body=Check this 

   out: http://inkblots.markwoodman.com/2006/02/02/rome-08-released/' 

   >Email this</a> | 

   

   <a href='http://del.icio.us/post/?url=http://inkblots.markwoodman.

   com/2006/02/02/rome-08-released/&title=ROME 0.8 Released'>Add to 

   delicious</a> | 

   

   <a href='http://blogsearch.google.com/blogsearch?hl=en&q=ROME 

   0.8 Released'>Blog Search this</a>

</div>

Moving On

As you can see, ROME can be a valuable way to parse and publish feeds with very little effort. It is fair to say, however, that there is much more to the library than we have covered here. ROME provides support for enclosures, podcasting, and a good deal more. If you would like to see more code examples and tutorials, be sure to visit the ROME Tutorials page.

Finally, if you get stuck and need a hand implementing ROME in your projects, the user and developer groups are always willing to lend a hand. (Full disclosure: I'm on the developer team.) Feel free to stop by the wiki site, join the mailing lists, and get involved.