ROME in a Day: Parse and Publish Feeds in Java
February 22, 2006
Ready to parse and publish RSS and Atom feeds in Java? In this step-by-step tutorial, we'll show you how to pull in an existing feed, add your own content, and publish the results in a new format, all in 100 lines of code. (200 lines with whitespace and comments.)
Knowing that RSS and Atom feeds are "just" XML, you might think that parsing and creating syndicated feeds in Java should be a snap. Pick any one type of RSS, and you might be right. Unfortunately, there are at least ten flavors of RSS and Atom out there: RSS 0.90, RSS 0.91 Netscape, RSS 0.91 Userland, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, and the newest addition to the bunch, Atom 1.0. Then there are all the namespace modules, like Dublin Core, Media, and so on. It's all messy enough to make a grown programmer cry. Wipe those tears, Java developers, and say hello to ROME.
When in ROME
In this tutorial, we'll be using ROME to do all the heavy lifting. ROME is an open source (Apache licensed) Java library which is designed to make it easy for you to parse and create syndicated feeds, regardless of format. In fact, all of the variants of RSS and Atom mentioned earlier are supported by ROME.
ROME doesn't just come with features, it also has a proven track record on sites like My AOL, CNET Networks, and Edmunds.com. The Powered By ROME wiki page describes how ROME is being used in these and other applications.
The basic approach of ROME is to parse any RSS or Atom feed item into a canonical bean interface. This lets you as a developer manage fairly homogeneous item beans regardless of their original format. Even better, ROME makes it easy to create a new RSS or Atom feed, using those very same beans. This tutorial is going to show you how to do just that.
Warming Up
To illustrate how to use ROME, we are going to mimic some features made popular by FeedBurner, a site which provides feed hosting and statistics for RSS and Atom publishers. FeedBurner itself doesn't use ROME (as far as I know), so we are going to mimic their end product, not their process.
FeedBurner offers a service called FeedFlare, by which publishers can add a contextual footer to each item in their RSS or Atom feed. (This is a great example of the Immediate Action pattern.) The links in the FeedFlare footer are built using data from the feed items, and allow the reader to easily email a link, bookmark the item in del.icio.us, and so on. Figure 1 shows a FeedFlare footer as displayed in NewsAlloy:
|
To demonstrate how easy it is to use ROME, this tutorial will show you how to play the part of an--ahem--FeedWarmer. You will pull in any RSS x.x or Atom x.x feed of your choosing, read key information from each feed item, add an interactive footer, and then republish the results in a new format.
(If you have ever looked at the XML differences between RSS 1.0, RSS 2.0, and Atom 1.0, you'll realize that changing feed formats is no small feat. ROME, thankfully, handles all the hard work.)
Key Ingredients
Before you can jump into coding, you will need to download a few things. Here is the list of ingredients you will need for today's recipe:
- Java J2SE 1.4 or higher
- ROME 0.8 or higher
- ContentModule 0.3 or higher, a ROME add-on which handles XML elements with
purl.org's
content
namespace - JDOM 1.0, which is used for XML handling
Now that you have downloaded the above resources and have your development environment set up, let's get into the code.
Creating the FeedWarmer
ROME provides a series of bean interfaces which can be used to access the data of
a
syndicated feed, regardless of format. As we write the FeedWarmer
class, we'll
make use of the SyndFeed and SyndEntry interfaces to keep our code as clean as possible. To understand the broader
scope of the classes we use, be sure to keep the ROME Javadocs close
by.
Let's start with the basic structure of the FeedWarmer
class. We'll build on
this structure as we go through the tutorial, but you can also use the complete FeedWarmer.java source code as a
reference.
public class FeedWarmer { public FeedWarmer() {} public String warmFeed(URL url, String outFormat) throws IOException, FeedException {} private void addFooter(SyndEntry entry) {} private String createFooter(String original, String link, String title) {} public static void main(String[] args) throws Exception {} }
As shown above, the structure of our program is fairly simple. We'll fill out the details as we go, following this basic approach:
- We will create a default constructor,
FeedWarmer()
, to initialize any instance variables. - The primary method used in this class is
warmFeed()
, which takes an RSS or Atom feed URL and a desired output format. It is the control point to parse a feed into aSyndFeed
bean, modify theSyndEntry
beans representing each feed item, and return the results in a new format. - The
addFooter()
method will be used to add an interactive footer to eachSyndEntry
bean. These beans hold the data from a syndicated feed, and represent either an RSS item or an Atom entry. createFooter()
is a utility method which uses the arguments passed in to create a footer. It appends the footer to the original feed HTML and returns the result.- Last but not least, we'll use a standard
main()
method to try out our work.
Instance Variables and Constructor
Let's begin by setting up some instance variables and the constructor. The ROME library
has
utility classes for parsing and publishing syndicated feeds, so our FeedWarmer will
need an
instance of SyndFeedInput and SyndFeedOutput. We will also be using a ROME module to handle
<content:encoded />
XML elements. Modules are invoked using their
target schema, so we'll also add a URI for purl.org's content schema:
public class FeedWarmer { /** Namespace URI for content:encoded elements */ private static String CONTENT_NS = "http://purl.org/rss/1.0/modules/content/"; /** Parses RSS or Atom to instantiate a SyndFeed. */ private SyndFeedInput input; /** Transforms SyndFeed to RSS or Atom XML. */ private SyndFeedOutput output; /** * Default constructor. */ public FeedWarmer() { input = new SyndFeedInput(); output = new SyndFeedOutput(); }
The warmFeed()
Method
This is the heart of the FeedWarmer program. The inputs to this method are the URL
of the
feed we wish to "warm," and a String
indicating the desired output format.
Our first order of business is to use our instance of the SyndFeedInput
to
build a SyndFeed
bean from an input stream of the syndicated feed. The ROME
library provides an XmlReader class to fetch the feed over HTTP, determine the character encoding, and
provide the input stream we need.
/** * Add FeedWarmer footer to all items of any feed, * then republish as format specified in OUTPUT. * @param url The feed URL to input * @param outFormat The feed type to output. Can be: * rss_0.9, rss_0.91, rss_0.92, rss_0.93, * rss_0.94, rss_1.0, rss_2.0, atom_0.3, * or atom_1.0 * @throws IOException * @throws FeedException */ public String warmFeed(URL url, String outFormat) throws IOException, FeedException { // Load the feed, regardless of RSS or Atom type SyndFeed feed = input.build(new XmlReader(url));
Now that we have a populated SyndFeed
bean, we can set the feed type of the
output format we desire. This does not effect the structure or contents of the bean
itself,
but it will instruct the SyndFeedOutput
on the desired XML output at the end of
the method.
// Set the output format of the feed feed.setFeedType(outFormat);
This is where the simplicity of ROME becomes evident. You don't need to use
XPath
or understand the structure of the feed, because you have the
SyndFeed
bean interface. Let's show that this feed has gone through
FeedWarmer by altering the feed's title with a getter and setter:
// Modify the feed title String newTitle = feed.getTitle() + " (Warmed)"; feed.setTitle(newTitle);
Moving beyond the trivial, we need to grab and modify each RSS item (or Atom entry)
and add
a footer to it. The SyndFeed
bean has a getEntries()
method for
just that purpose. We'll use that List
of SyndEntry
beans to get
the job done. We haven't fleshed out the addFooter()
method yet, but we'll put
in the call here anyway:
// Iterate through feed items, adding a footer each item Iterator entryIter = feed.getEntries().iterator(); while (entryIter.hasNext()) { SyndEntry entry = (SyndEntry) entryIter.next(); addFooter(entry); }
To finish up this method, we need to output the modified feed in the format passed
in as an
argument. This is a job for SyndFeedOutput
, which takes a SyndFeed
and a Writer
. We'll use a StringWriter
since the method signature
for warmFeed()
has us returning a String
:
// Generate XML in output format, regardless of original StringWriter writer = new StringWriter(); output.output(feed, writer); return writer.toString(); }
Now that we have the basic control structure of the FeedWarmer, we need to write the
internals of the addFooter()
method we referenced.
The addFooter()
Method
This method is handed a SyndEntry
instance, which represents an RSS item or
Atom entry. Our main task in this method is to determine the best place to add our
HTML
footer. Many feeds have escaped HTML in their <description />
elements,
and many others put HTML in a CDATA
block in a <content:encoded
/>
element. The latter is certainly preferable, so we should look for those
first and fall back to escaped HTML in <description />
only as a last
resort.
If the above makes you cringe, you aren't alone. The dirty secret of syndicated feeds is that XML best practices often have been flushed right down the toilet. But that isn't our problem to solve in this program, so let's move on.
As mentioned earlier, we're going to use ROME's ContentModule
to discover
whether there are <content:encoded />
elements we can use. This namespace
isn't actually part of any RSS or Atom specification, but was rather added later to
address
the misery of escaped HTML. ROME uses modules to implement the specifics of such
add-ons.
As we dig into this method, we'll need to grab the value of the SyndEntry
's
title and link. These values will be used in createFooter()
to build the HTML
provided by our FeedWarmer.
/** * Add FeedWarmer footer to an entry. * @param entry */ private void addFooter(SyndEntry entry) { // Prep variables used in loops String title = entry.getTitle(); String link = entry.getLink(); // Use the add-on ContentModule to handle // <content:encoded/> elments within the feed ContentModule module = ((ContentModule) entry.getModule(CONTENT_NS));
We've asked for the ContentModule
by referencing the namespace stored as a
constant on the class itself. If the module isn't null, that means the entry does
in fact
have one or more <content:encoded />
elements. We can use that module to
grab the encoded strings from the SyndEntry
, add a footer to each, and set the
"warmed" encoded strings back on the SyndEntry
:
// If content:encoded is found, use that. if(module!=null) { // Container for footer-appended HTML strings List newStringList = new ArrayList(); // Iterate through encoded HTML, creating footers Iterator oldStringIter = module.getEncodeds().iterator(); while (oldStringIter.hasNext()) { String original = (String) oldStringIter.next(); newStringList.add(createFooter(original, link, title)); } // Set new encoded HTML strings on entry module.setEncodeds(newStringList); }
If we don't have a module, it means that only a description or its semantic equivalent is available on the bean. Since there may be more than one, we'll iterate through each and add our HTML footer. ROME will take care of escaping the HTML, if that is necessary, upon output.
else { // Fall back to adding footer in description // This results in escaped HTML. Ugly, but common. Iterator contentIter = entry.getContents().iterator(); while (contentIter.hasNext()) { // Target the description node SyndContent content = (SyndContent) contentIter.next(); // Create and set a footer-appended description String original = content.getValue(); content.setValue(createFooter(original, link, title)); } } }
That wraps up addFooter()
. We now need to code up the
createFooter()
method, which is the last real work left.
The createFooter()
Method
We have used ROME to do all the hard stuff, so all that remains is to create the HTML
footer added to each SyndEntry
. To mimic some features provided by the
inspiration of FeedWarmer, we'll add links for email, del.icio.us, and a Google Blog
Search.
None of this code is ROME-specific, so it should be fairly self-explanatory.
A word of caution: don't cut and paste the following section into your Java editor. The code shown has escaped HTML so that it will display properly in your web browser. Instead, use the actual source code in FeedWarmer.java, which contains unescaped HTML.
/** * Create a feed item footer of immediate actions * by using information from the feed item itself * @param original The original text of the feed item * @param link The link for the feed item * @param title The title of the feed item * @return */ private String createFooter(String original, String link, String title) { // Use StringBuffer to create a sb StringBuffer sb = new StringBuffer(original); sb.append("\n\n<div class='feedwarmer'><hr/>"); sb.append("<i>Getting Warmer:</i> "); // Add email link using title and item link sb.append("<a href='mailto:?body=Check this out: "); sb.append(link).append("'>Email this</a> | "); // Add delicious link using item title link sb.append("<a href='http://del.icio.us/post/?url="); sb.append(link).append("&title=").append(title); sb.append("'>Add to delicious</a> | "); // Add Google Blogs Search link using item title sb.append("<a href='http://blogsearch.google.com/"); sb.append("blogsearch?hl=en&q=").append(title); sb.append("'>Blog Search this</a>"); // Finish and return the sb sb.append("</div>\n"); return sb.toString(); }
These are just a few simple examples of what you can do with just the item's title and link. Once you realize how easy it is to get at a feed's data with ROME, a lot of opportunities open up to leverage that data when republishing the feed.
The main()
Method
We're in the home stretch. Let's put together a simple main()
method which
allows us to test the code from the command line. The first argument passed in will
be the
URL of any feed you like. This version writes the results out as RSS 2.0 to the console,
but
you could just as easily write them out to a file or other output stream.
/** * Main method to demo from command line. * @param args args[0] must be the URL of a feed * @throws Exception */ public static void main(String[] args) throws Exception { // Create instance FeedWarmer warmer = new FeedWarmer(); // "Warm" a feed using URL passed in, // designating the feed output desired String warmedFeed = warmer.warmFeed(new URL(args[0]), "rss_2.0"); // Print to console to demo results System.out.println(warmedFeed); } }
Try It Out
Compile FeedWarmer.java and take your new creation for a spin. The output XML should now have a footer added to each feed item or entry, as shown in Figure 2:
|
Here is a sample of XML that has been run
through the FeedWarmer. Note that the <description />
element is
untouched, but the <content:encoded />
element has our footer at the
end:
<div id='feedwarmer'><hr/> <i>Getting Warmer:</i> <a href='mailto:?subject=ROME 0.8 Released&body=Check this out: http://inkblots.markwoodman.com/2006/02/02/rome-08-released/' >Email this</a> | <a href='http://del.icio.us/post/?url=http://inkblots.markwoodman. com/2006/02/02/rome-08-released/&title=ROME 0.8 Released'>Add to delicious</a> | <a href='http://blogsearch.google.com/blogsearch?hl=en&q=ROME 0.8 Released'>Blog Search this</a> </div>
Moving On
As you can see, ROME can be a valuable way to parse and publish feeds with very little effort. It is fair to say, however, that there is much more to the library than we have covered here. ROME provides support for enclosures, podcasting, and a good deal more. If you would like to see more code examples and tutorials, be sure to visit the ROME Tutorials page.
Finally, if you get stuck and need a hand implementing ROME in your projects, the user and developer groups are always willing to lend a hand. (Full disclosure: I'm on the developer team.) Feel free to stop by the wiki site, join the mailing lists, and get involved.