Menu

Hacking eBay: Turning Email Alerts into Atom

November 23, 2005

Bob DuCharme

From our geeky perspective, Atom and RSS seem to be sweeping through the internet, changing the way people and systems get notified about events. From a broader perspective, though, they've got a long way to go — we all have plenty of computer-literate friends who've never heard of either.

This means that plenty of opportunities remain to improve systems and applications using RSS or Atom. (Because Atom is the latest and greatest in the history of RSS formats, with endorsements from key representatives of the earlier formats, I'm going to focus on using Atom, but the basic ideas here would work for any flavor of RSS.) I see two basic categories of such opportunities: as demos to show how a given system can benefit from an Atom delivery option, and as personal utilities to make your own life easier. A given application can fall into both categories; I wrote something to convert eBay saved search notifications into an Atom feed to make my own life easier, but if I were an eBay employee I'd be showing it to my boss and saying "Hey! This is easy to implement and a great new way to spread the use of our product!"

An Atom version of an email notification system makes a great demo of Atom's power for several reasons. First, a popular incentive for Atom use is that it reduces our email in a controlled fashion, reducing not only the overall email load but also the chance of spam-checker false positives. Second, the difficult part has probably already been done for you — many online products and systems already have elaborate infrastructures in place to track who wants email notification of what. These systems may let recipients specify what they want to be notified about, the email address to send the notification, the frequency of the emails, the format of the emails, and other details.

For example, when you perform a search such as "Elvis black velvet" on eBay, the results screen includes an Add to Favorite Searches link that lets you name the search and add it to a list accessible from your My eBay screen. In addition to naming the search, you can tell the eBay system to send you an email when relevant new items appear.

black velvet Elvis portrait
Figure 1. Black velvet Elvis portrait

Elvis Black Velvet

My wife and I have a room in the basement with a lot of Elvis stuff. The room has its own bathroom with more Elvis stuff, so we refer to the combination as the "Presley Suite." We're not fanatical Elvis fans — I don't think we have more than a dozen CDs and vinyl Presley albums — but he was responsible for a lot of great rock-and-roll and a lot of silly kitsch that's nearly as entertaining.

Our Presley Suite lacked any painted black velvet Elvis pictures, a serious omission in a collection of Elvis kitsch, so I created an eBay saved search on the phrase "Elvis black velvet." Like a lot of eBay searches, you find some great deals that aren't as great when you consider shipping costs, and you see the same people selling the same things over and over. I eventually bought one of the recurring ones, shown above: a somewhat Native American-looking Elvis and an even less Native American-looking woman. (Elvis fans will recognize it as an allusion to Flaming Star, a 1960 western in which Presley played the son of a Native American woman and a white man.)

Occasionally some real one-of-a-kind things showed up in the search results, and the serious collectors quickly drove the price out of my reach. My favorite was an old black velvet Elvis paint-by-numbers kit that hadn't been used yet — imagine white lines dividing up his black velvet face into tiny numbered sections. That would have been a highlight of our collection, right up there with D.J. Fontana's autograph and the Thai bamboo curtain of Warhol's gunslinger Elvis portrait that we found on London's Portobello Road.

Several libraries and services are available to convert email to RSS or Atom. Evaluating each of these, picking one, installing it, and configuring it on my host provider's system seemed like more trouble than just creating something myself. Doing it myself also lets me customize the input and output as much as I like. For example, the Mailbucket service, which lets you create an email address on their server and then creates an RSS feed of mail sent to that address, won't work for an eBay favorite search because eBay sends the notification emails to the email address they have on file for you, not to any address that you want specify when you create the search. Services like Mailbucket also convert each email into a single RSS or Atom entry, but I wanted to see a single eBay email about six hits for the search on "Elvis black velvet" converted to six Atom entries.

To convert the formats, I knew that tools like Perl and XSLT make it easy to convert between plain text, HTML, and XML, but I needed a way to have the arrival of the mail trigger the conversion. This turned out to be relatively easy once I found out about procmail.

procmail

procmail is a venerable Unix utility for automating the processing of mail. (I'm not running my own mail server, but my host provider lets me create procmail configurations.) It first became popular for automated sorting of mail into different folders, but now most popular mail clients include built-in features to do that. procmail then became popular for spam detection, but mail client programs usually do that now as well. As a general-purpose tool for redirecting emails meeting certain criteria to specific locations or, better yet, for routing them to be processed by particular programs, procmail is still a great tool for adapting email workflows to newer technologies such as Atom.

Before procmail can do anything with your mail, you have to route your mail through it. This may mean creating a .forward file; in my case, I had to fill out a mail account configuration screen on my host provider's mail administration page. Mail then gets piped through the procmail program, which checks a .procmailrc file to see if anything special should be done to the email. If not, the email is passed along to your inbox untouched.

For the syntax of this .procmailrc file, a tutorial at the Ohio State University math department provides a good start, and the procmail quickstart at Infinite Ink is far more detailed than its name implies.

A typical .procmailrc file begins with the setting of some environment variables and then has a series of rules that each specify which mail they apply to and what to do to those emails. Most rules start with :0: on its own line. The following rule tells procmail to route mail that has the string sales in its subject line to the sales folder:

:0:
* ^Subject:.*sales
sales

A regular expression on the rule's second line describes the mail to look for, usually by specifying a line in the mail header to check, and the third line indicates where to send the email message. Beginning the third line with a pipe symbol tells procmail to send the email message to be used as input to a program, like in this procmail rule:

:0:
* ^From:.*savedsearches@ebay.com
| /usr/www/users/bobd/rss/bin/ebaymail2atom.sh

This message tells procmail that when the "From" line includes the string savedsearches@ebay.com, the contents of that message should be piped to the ebaymail2atom.sh script in the specified directory.

Converting Email to Atom

How should you convert your email to Atom? It depends on the format of the mail, the tools provided on the system you're using, and which you're most comfortable with. eBay emails include both text and HTML versions of the message, so I wrote a Perl script to strip the text version and add a bit of metadata, and then I used the libxslt xsltproc command-line XSLT processor to convert the Perl script's output to Atom. The xsltproc -html parameter lets you process ill-formed HTML as if it were well-formed XML so that you can apply an XSLT stylesheet to it.

The following shows my ebaymail2atom.sh shell script. A backslash ending a shell script line is a continuation character, so treat the line that follows it as more of the same line. I'll discuss the bolded parts first and then come back to the beginning and ending parts.

#! /bin/csh
set rssdir=/usr/www/users/bobd/rss

# setup for update of newAtomFeeds.atom: backup old one 
# and check dir contents before creating ebay atom file.
cp $rssdir/newAtomFeeds.atom $rssdir/newAtomFeeds.atom.bkp
ls -1tr .. > /tmp/temp1.txt

# convert input to html
$rssdir/bin/ebaymail2html.pl > /tmp/ebaytemp.html
# convert html to atom
xsltproc -html --stringparam RSSFilePath "$rssdir" \
  $rssdir/bin/ebayhtml2atom.xsl /tmp/ebaytemp.html
rm /tmp/ebaytemp.html

# In case a new file (whose name we won't know) got created
chmod 644 /usr/www/users/bobd/rss/*.atom

# finish update of newAtomFeeds.atom
ls -1tr .. > /tmp/temp2.txt
diff /tmp/temp1.txt /tmp/temp2.txt | \
  $rssdir/bin/newentryatom.pl > $rssdir/newAtomFeeds.atom

(Links to the complete XSLT style sheets and Perl scripts are provided at the end of this article.) In addition to extracting the HTML from the email, the Perl script adds the following metadata to the email in HTML meta elements (when I tried adding it in non-HTML elements, libxslt complained, because I had told it with the -html switch to expect HTML) :

  • An ID value for each item based on the eBay ID assigned to it. While Atom doen't require much, it does require an ID for each entry. The Perl script pulls this from the URLs that link to the eBay items.

  • The search query string used to generate the result set. This is also pulled from a URL in the email HTML, and the XSLT stylesheet uses it to create the subtitle of the Atom feed.

  • The filename of the output Atom file, created from the name assigned to the search when it was added to the Favorite Searches list.

Because the output filename is stored within the data itself, there's no simple way to know at shell script execution time what that name will be. So, although a command-line execution of an XSLT processor (or of most other text processing programs) typically names the output file along with other execution parameters, the ebayhtml2atom.xsl program called by ebaymail2atom.sh uses a different technique: the XSLT 1.0 document extension element defined as part of the EXSLT project. (XSLT 2.0 has a comparable instruction built in as part of the base spec.) The style sheet reads the filename that the Perl script stored in the meta element with a name attribute value of filename and uses that filename to assemble the output path used by the exsl:document element that builds the output file.

The style sheet has one more bit of tricky I/O to implement. Let's say you want a minimum of the eight most recent "Elvis black velvet" entries in your Atom feed, and two new ones arrive. You'll want to output those two and then the six most recent ones from the existing file, so the stylesheet uses the XSLT document() function to read those from the disk file.

Notification of the Notification

Once you have this all set up, the true test of whether it works is whether procmail and all the shell script pieces do their jobs when eBay sends an alert to your mailbox. If you point your RSS/Atom reader client at a file that doesn't exist yet, it will give you an error, so I was grumbling to myself about the inconvenience of waiting for that first eBay email about a given search to trigger the creation of the Atom file before I could add the feed to a reader and make sure that it worked. I thought it would be much easier if some automated system would notify me when a new feed was ready, so I wrote a Perl script to create a newAtomFeeds.atom feed!

Near the beginning of my ebaymail2atom.sh shell script, an ls command saves a one-column list of the files in the directory where I store Atom and RSS files into a file in the /tmp directory. After the bolded part of the shell script shown above creates an Atom version of the email, a similar ls command creates a second list, and the Unix diff command then compares the two lists, sending the result to a newentryatom.pl Perl script. This script creates a new entry in the newAtomFeeds.atom file if a new file showed up in that directory in between the creation of the /tmp/temp1.txt and /tmp/temp2.txt files.

As with the eBay item Atom file created by the ebayhtml2atom.xsl style sheet, we want the newentryatom.pl Perl script to pad the Atom file with the most recent existing entries if there aren't many new entries. To do this before, we saw that the xsltproc XSLT processor can use the XPath document() function to read from an existing disk file that has the same name as the output file that it's going to write to. A Perl script whose output is being redirected to a given file, however, can't read from a version of that file that was sitting on the disk just before the Perl script was run. So, the ebaymail2atom.sh shell script creates a copy of newAtomFeeds.atom called newAtomFeeds.atom.bkp, and that's what newentryatom.pl reads for entries to pad the file it creates. With my RSS/Atom client pointing at the newAtomFeeds.atom file, I'll always know when a new eBay feed is ready to test.

Testing and Setting Up Your Script

Before you run a shell script like this, make sure that all the pieces work properly. Find out where the mail files are stored on your system, store a few messages as individual files, and run the Perl script and the XSLT style sheet on them to make sure that your script and style sheet do what you want.

The same email files will be useful to do your integration testing of the shell script that ties everything together. Send their contents to the shell script with a command line like this:

./ebaymail2atom.sh < ~/temp/ebaytest01.mail

Use an RSS/Atom reader to look at the file that gets created and make sure that it looks OK. For additional reassurance,validate it against an Atom 1.0 schema.

Watch out for another trap that lies in wait for shell script dabblers like myself: remember that even if your scripts run properly when you send files to them as shown above, you don't know what the current directory will be when procmail runs the scripts, so you need to qualify the names of all files and scripts in your shell script with full pathnames. As you can see, I put the pathname in a variable and used that throughout.

More Atom, More Convenience

eBay will eventually understand the value of offering an Atom delivery option in addition to an email delivery option, and you won't need to use procmail and scripting to redirect emails about new Elvis black velvet items to your RSS/Atom reader. Scripting on top of the mail infrastructure is not the leanest solution to use until then, but as a proof of concept that you can actually use (as opposed to simply being a demo) it points the way for people who don't yet understand the value of this relatively new form of communication. The Unix philosophy of connecting the input and output of different tools to create new applications (a philosophy that predates the "mashup" Web 2.0 hype by several generations) makes it easy to incorporate Atom technology into existing infrastructures, so you should be able to add Atom notification features to more than just email applications. Check which scripting tools your server or host provider offers, think about which of the server's applications would benefit from Atom notification, and build it in yourself — you might surprise some people.

Links to Files Mentioned in This Article