Hacking eBay: Turning Email Alerts into Atom
November 23, 2005
From our geeky perspective, Atom and RSS seem to be sweeping through the internet, changing the way people and systems get notified about events. From a broader perspective, though, they've got a long way to go — we all have plenty of computer-literate friends who've never heard of either.
This means that plenty of opportunities remain to improve systems and applications using RSS or Atom. (Because Atom is the latest and greatest in the history of RSS formats, with endorsements from key representatives of the earlier formats, I'm going to focus on using Atom, but the basic ideas here would work for any flavor of RSS.) I see two basic categories of such opportunities: as demos to show how a given system can benefit from an Atom delivery option, and as personal utilities to make your own life easier. A given application can fall into both categories; I wrote something to convert eBay saved search notifications into an Atom feed to make my own life easier, but if I were an eBay employee I'd be showing it to my boss and saying "Hey! This is easy to implement and a great new way to spread the use of our product!"
An Atom version of an email notification system makes a great demo of Atom's power for several reasons. First, a popular incentive for Atom use is that it reduces our email in a controlled fashion, reducing not only the overall email load but also the chance of spam-checker false positives. Second, the difficult part has probably already been done for you — many online products and systems already have elaborate infrastructures in place to track who wants email notification of what. These systems may let recipients specify what they want to be notified about, the email address to send the notification, the frequency of the emails, the format of the emails, and other details.
For example, when you perform a search such as "Elvis black velvet" on eBay, the results screen includes an Add to Favorite Searches link that lets you name the search and add it to a list accessible from your My eBay screen. In addition to naming the search, you can tell the eBay system to send you an email when relevant new items appear.
Figure 1. Black velvet Elvis portrait
Elvis Black Velvet
My wife and I have a room in the basement with a lot of Elvis stuff. The room has its own bathroom with more Elvis stuff, so we refer to the combination as the "Presley Suite." We're not fanatical Elvis fans — I don't think we have more than a dozen CDs and vinyl Presley albums — but he was responsible for a lot of great rock-and-roll and a lot of silly kitsch that's nearly as entertaining.
Our Presley Suite lacked any painted black velvet Elvis pictures, a serious omission in a collection of Elvis kitsch, so I created an eBay saved search on the phrase "Elvis black velvet." Like a lot of eBay searches, you find some great deals that aren't as great when you consider shipping costs, and you see the same people selling the same things over and over. I eventually bought one of the recurring ones, shown above: a somewhat Native American-looking Elvis and an even less Native American-looking woman. (Elvis fans will recognize it as an allusion to Flaming Star, a 1960 western in which Presley played the son of a Native American woman and a white man.)
Occasionally some real one-of-a-kind things showed up in the search results, and the serious collectors quickly drove the price out of my reach. My favorite was an old black velvet Elvis paint-by-numbers kit that hadn't been used yet — imagine white lines dividing up his black velvet face into tiny numbered sections. That would have been a highlight of our collection, right up there with D.J. Fontana's autograph and the Thai bamboo curtain of Warhol's gunslinger Elvis portrait that we found on London's Portobello Road.
Several libraries and services are available to convert email to RSS or Atom. Evaluating each of these, picking one, installing it, and configuring it on my host provider's system seemed like more trouble than just creating something myself. Doing it myself also lets me customize the input and output as much as I like. For example, the Mailbucket service, which lets you create an email address on their server and then creates an RSS feed of mail sent to that address, won't work for an eBay favorite search because eBay sends the notification emails to the email address they have on file for you, not to any address that you want specify when you create the search. Services like Mailbucket also convert each email into a single RSS or Atom entry, but I wanted to see a single eBay email about six hits for the search on "Elvis black velvet" converted to six Atom entries.
To convert the formats, I knew that tools like Perl and XSLT make it easy to convert between plain text, HTML, and XML, but I needed a way to have the arrival of the mail trigger the conversion. This turned out to be relatively easy once I found out about procmail.
procmail is a venerable Unix utility for automating the processing of mail. (I'm not running my own mail server, but my host provider lets me create procmail configurations.) It first became popular for automated sorting of mail into different folders, but now most popular mail clients include built-in features to do that. procmail then became popular for spam detection, but mail client programs usually do that now as well. As a general-purpose tool for redirecting emails meeting certain criteria to specific locations or, better yet, for routing them to be processed by particular programs, procmail is still a great tool for adapting email workflows to newer technologies such as Atom.
Before procmail can do anything with your mail, you have to route your mail through
This may mean creating a
.forward file; in my case, I had to fill out a mail
account configuration screen on my host provider's mail administration page. Mail
piped through the procmail program, which checks a
.procmailrc file to see if
anything special should be done to the email. If not, the email is passed along to
For the syntax of this
.procmailrc file, a tutorial at the Ohio State
University math department provides a good start, and the procmail quickstart at Infinite
Ink is far more detailed than its name implies.
.procmailrc file begins with the setting of some environment
variables and then has a series of rules that each specify which mail they apply to
to do to those emails. Most rules start with
:0: on its own line. The following
rule tells procmail to route mail that has the string
sales in its subject line
to the sales folder:
:0: * ^Subject:.*sales sales
A regular expression on the rule's second line describes the mail to look for, usually by specifying a line in the mail header to check, and the third line indicates where to send the email message. Beginning the third line with a pipe symbol tells procmail to send the email message to be used as input to a program, like in this procmail rule:
:0: * ^From:.*email@example.com | /usr/www/users/bobd/rss/bin/ebaymail2atom.sh
This message tells procmail that when the "From" line includes the string
firstname.lastname@example.org, the contents of that message should be piped to the
ebaymail2atom.sh script in the specified directory.
Converting Email to Atom
How should you convert your email to Atom? It depends on the format of the mail, the
provided on the system you're using, and which you're most comfortable with. eBay
include both text and HTML versions of the message, so I wrote a Perl script to strip
text version and add a bit of metadata, and then I used the libxslt xsltproc command-line
XSLT processor to convert the Perl script's output to Atom. The xsltproc
parameter lets you process ill-formed HTML as if it were well-formed XML so that you
apply an XSLT stylesheet to it.
The following shows my
ebaymail2atom.sh shell script. A backslash ending a
shell script line is a continuation character, so treat the line that follows it as
the same line. I'll discuss the bolded parts first and then come back to the beginning
#! /bin/csh set rssdir=/usr/www/users/bobd/rss # setup for update of newAtomFeeds.atom: backup old one # and check dir contents before creating ebay atom file. cp $rssdir/newAtomFeeds.atom $rssdir/newAtomFeeds.atom.bkp ls -1tr .. > /tmp/temp1.txt # convert input to html $rssdir/bin/ebaymail2html.pl > /tmp/ebaytemp.html # convert html to atom xsltproc -html --stringparam RSSFilePath "$rssdir" \ $rssdir/bin/ebayhtml2atom.xsl /tmp/ebaytemp.html rm /tmp/ebaytemp.html # In case a new file (whose name we won't know) got created chmod 644 /usr/www/users/bobd/rss/*.atom # finish update of newAtomFeeds.atom ls -1tr .. > /tmp/temp2.txt diff /tmp/temp1.txt /tmp/temp2.txt | \ $rssdir/bin/newentryatom.pl > $rssdir/newAtomFeeds.atom
(Links to the complete XSLT style sheets and Perl scripts are provided at the end
article.) In addition to extracting the HTML from the email, the Perl script adds
following metadata to the email in HTML
meta elements (when I tried adding it
in non-HTML elements, libxslt complained, because I had told it with the
switch to expect HTML) :
An ID value for each item based on the eBay ID assigned to it. While Atom doen't require much, it does require an ID for each entry. The Perl script pulls this from the URLs that link to the eBay items.
The search query string used to generate the result set. This is also pulled from a URL in the email HTML, and the XSLT stylesheet uses it to create the subtitle of the Atom feed.
The filename of the output Atom file, created from the name assigned to the search when it was added to the Favorite Searches list.
Because the output filename is stored within the data itself, there's no simple way
at shell script execution time what that name will be. So, although a command-line
of an XSLT processor (or of most other text processing programs) typically names the
file along with other execution parameters, the
ebaymail2atom.sh uses a different technique: the XSLT 1.0
document extension element defined as part of the EXSLT project. (XSLT 2.0 has a
comparable instruction built in as part of the base spec.) The style sheet reads the
filename that the Perl script stored in the
meta element with a
name attribute value of
filename and uses that filename to
assemble the output path used by the
exsl:document element that builds the
The style sheet has one more bit of tricky I/O to implement. Let's say you want a
of the eight most recent "Elvis black velvet" entries in your Atom feed, and two new
arrive. You'll want to output those two and then the six most recent ones from the
file, so the stylesheet uses the XSLT
to read those from the disk file.
Notification of the Notification
Once you have this all set up, the true test of whether it works is whether procmail
all the shell script pieces do their jobs when eBay sends an alert to your mailbox.
point your RSS/Atom reader client at a file that doesn't exist yet, it will give you
error, so I was grumbling to myself about the inconvenience of waiting for that first
email about a given search to trigger the creation of the Atom file before I could
feed to a reader and make sure that it worked. I thought it would be much easier if
automated system would notify me when a new feed was ready, so I wrote a Perl script
Near the beginning of my
ebaymail2atom.sh shell script, an
command saves a one-column list of the files in the directory where I store Atom and
files into a file in the
/tmp directory. After the bolded part of the shell
script shown above creates an Atom version of the email, a similar
creates a second list, and the Unix
diff command then compares the two lists,
sending the result to a
newentryatom.pl Perl script. This script creates a new
entry in the
newAtomFeeds.atom file if a new file showed up in that directory
in between the creation of the
As with the eBay item Atom file created by the
ebayhtml2atom.xsl style sheet,
we want the
newentryatom.pl Perl script to pad the Atom file with the most
recent existing entries if there aren't many new entries. To do this before, we saw
xsltproc XSLT processor can use the XPath
document() function to read from an
existing disk file that has the same name as the output file that it's going to write
Perl script whose output is being redirected to a given file, however, can't read
version of that file that was sitting on the disk just before the Perl script was
ebaymail2atom.sh shell script creates a copy of
newAtomFeeds.atom.bkp, and that's what
newentryatom.pl reads for entries to pad the file it creates. With my
RSS/Atom client pointing at the
newAtomFeeds.atom file, I'll always know when a
new eBay feed is ready to test.
Testing and Setting Up Your Script
Before you run a shell script like this, make sure that all the pieces work properly. Find out where the mail files are stored on your system, store a few messages as individual files, and run the Perl script and the XSLT style sheet on them to make sure that your script and style sheet do what you want.
The same email files will be useful to do your integration testing of the shell script that ties everything together. Send their contents to the shell script with a command line like this:
./ebaymail2atom.sh < ~/temp/ebaytest01.mail
Use an RSS/Atom reader to look at the file that gets created and make sure that it looks OK. For additional reassurance,validate it against an Atom 1.0 schema.
Watch out for another trap that lies in wait for shell script dabblers like myself: remember that even if your scripts run properly when you send files to them as shown above, you don't know what the current directory will be when procmail runs the scripts, so you need to qualify the names of all files and scripts in your shell script with full pathnames. As you can see, I put the pathname in a variable and used that throughout.
More Atom, More Convenience
eBay will eventually understand the value of offering an Atom delivery option in addition to an email delivery option, and you won't need to use procmail and scripting to redirect emails about new Elvis black velvet items to your RSS/Atom reader. Scripting on top of the mail infrastructure is not the leanest solution to use until then, but as a proof of concept that you can actually use (as opposed to simply being a demo) it points the way for people who don't yet understand the value of this relatively new form of communication. The Unix philosophy of connecting the input and output of different tools to create new applications (a philosophy that predates the "mashup" Web 2.0 hype by several generations) makes it easy to incorporate Atom technology into existing infrastructures, so you should be able to add Atom notification features to more than just email applications. Check which scripting tools your server or host provider offers, think about which of the server's applications would benefit from Atom notification, and build it in yourself — you might surprise some people.