Hacking eBay: Turning Email Alerts into Atom
by Bob DuCharme
|
Pages: 1, 2
Converting Email to Atom
How should you convert your email to Atom? It depends on the format of the mail, the tools provided on the system you're using, and which you're most comfortable with. eBay emails include both text and HTML versions of the message, so I wrote a Perl script to strip the text version and add a bit of metadata, and then I used the libxslt xsltproc command-line XSLT processor to convert the Perl script's output to Atom. The xsltproc -html parameter lets you process ill-formed HTML as if it were well-formed XML so that you can apply an XSLT stylesheet to it.
The following shows my ebaymail2atom.sh shell script. A backslash ending a shell script line is a continuation character, so treat the line that follows it as more of the same line. I'll discuss the bolded parts first and then come back to the beginning and ending parts.
#! /bin/csh
set rssdir=/usr/www/users/bobd/rss
# setup for update of newAtomFeeds.atom: backup old one
# and check dir contents before creating ebay atom file.
cp $rssdir/newAtomFeeds.atom $rssdir/newAtomFeeds.atom.bkp
ls -1tr .. > /tmp/temp1.txt
# convert input to html
$rssdir/bin/ebaymail2html.pl > /tmp/ebaytemp.html
# convert html to atom
xsltproc -html --stringparam RSSFilePath "$rssdir" \
$rssdir/bin/ebayhtml2atom.xsl /tmp/ebaytemp.html
rm /tmp/ebaytemp.html
# In case a new file (whose name we won't know) got created
chmod 644 /usr/www/users/bobd/rss/*.atom
# finish update of newAtomFeeds.atom
ls -1tr .. > /tmp/temp2.txt
diff /tmp/temp1.txt /tmp/temp2.txt | \
$rssdir/bin/newentryatom.pl > $rssdir/newAtomFeeds.atom
(Links to the complete XSLT style sheets and Perl scripts are provided at the end of this article.) In addition to extracting the HTML from the email, the Perl script adds the following metadata to the email in HTML meta elements (when I tried adding it in non-HTML elements, libxslt complained, because I had told it with the -html switch to expect HTML) :
-
An ID value for each item based on the eBay ID assigned to it. While Atom doen't require much, it does require an ID for each entry. The Perl script pulls this from the URLs that link to the eBay items.
-
The search query string used to generate the result set. This is also pulled from a URL in the email HTML, and the XSLT stylesheet uses it to create the subtitle of the Atom feed.
-
The filename of the output Atom file, created from the name assigned to the search when it was added to the Favorite Searches list.
Because the output filename is stored within the data itself, there's no simple way to know at shell script execution time what that name will be. So, although a command-line execution of an XSLT processor (or of most other text processing programs) typically names the output file along with other execution parameters, the ebayhtml2atom.xsl program called by ebaymail2atom.sh uses a different technique: the XSLT 1.0 document extension element defined as part of the EXSLT project. (XSLT 2.0 has a comparable instruction built in as part of the base spec.) The style sheet reads the filename that the Perl script stored in the meta element with a name attribute value of filename and uses that filename to assemble the output path used by the exsl:document element that builds the output file.
The style sheet has one more bit of tricky I/O to implement. Let's say you want a minimum of the eight most recent "Elvis black velvet" entries in your Atom feed, and two new ones arrive. You'll want to output those two and then the six most recent ones from the existing file, so the stylesheet uses the XSLT document() function to read those from the disk file.
Notification of the Notification
Once you have this all set up, the true test of whether it works is whether procmail and all the shell script pieces do their jobs when eBay sends an alert to your mailbox. If you point your RSS/Atom reader client at a file that doesn't exist yet, it will give you an error, so I was grumbling to myself about the inconvenience of waiting for that first eBay email about a given search to trigger the creation of the Atom file before I could add the feed to a reader and make sure that it worked. I thought it would be much easier if some automated system would notify me when a new feed was ready, so I wrote a Perl script to create a newAtomFeeds.atom feed!
Near the beginning of my ebaymail2atom.sh shell script, an ls command saves a one-column list of the files in the directory where I store Atom and RSS files into a file in the /tmp directory. After the bolded part of the shell script shown above creates an Atom version of the email, a similar ls command creates a second list, and the Unix diff command then compares the two lists, sending the result to a newentryatom.pl Perl script. This script creates a new entry in the newAtomFeeds.atom file if a new file showed up in that directory in between the creation of the /tmp/temp1.txt and /tmp/temp2.txt files.
As with the eBay item Atom file created by the ebayhtml2atom.xsl style sheet, we want the newentryatom.pl Perl script to pad the Atom file with the most recent existing entries if there aren't many new entries. To do this before, we saw that the xsltproc XSLT processor can use the XPath document() function to read from an existing disk file that has the same name as the output file that it's going to write to. A Perl script whose output is being redirected to a given file, however, can't read from a version of that file that was sitting on the disk just before the Perl script was run. So, the ebaymail2atom.sh shell script creates a copy of newAtomFeeds.atom called newAtomFeeds.atom.bkp, and that's what newentryatom.pl reads for entries to pad the file it creates. With my RSS/Atom client pointing at the newAtomFeeds.atom file, I'll always know when a new eBay feed is ready to test.
Testing and Setting Up Your Script
Before you run a shell script like this, make sure that all the pieces work properly. Find out where the mail files are stored on your system, store a few messages as individual files, and run the Perl script and the XSLT style sheet on them to make sure that your script and style sheet do what you want.
The same email files will be useful to do your integration testing of the shell script that ties everything together. Send their contents to the shell script with a command line like this:
./ebaymail2atom.sh < ~/temp/ebaytest01.mail
Use an RSS/Atom reader to look at the file that gets created and make sure that it looks OK. For additional reassurance,validate it against an Atom 1.0 schema.
Watch out for another trap that lies in wait for shell script dabblers like myself: remember that even if your scripts run properly when you send files to them as shown above, you don't know what the current directory will be when procmail runs the scripts, so you need to qualify the names of all files and scripts in your shell script with full pathnames. As you can see, I put the pathname in a variable and used that throughout.
More Atom, More Convenience
eBay will eventually understand the value of offering an Atom delivery option in addition to an email delivery option, and you won't need to use procmail and scripting to redirect emails about new Elvis black velvet items to your RSS/Atom reader. Scripting on top of the mail infrastructure is not the leanest solution to use until then, but as a proof of concept that you can actually use (as opposed to simply being a demo) it points the way for people who don't yet understand the value of this relatively new form of communication. The Unix philosophy of connecting the input and output of different tools to create new applications (a philosophy that predates the "mashup" Web 2.0 hype by several generations) makes it easy to incorporate Atom technology into existing infrastructures, so you should be able to add Atom notification features to more than just email applications. Check which scripting tools your server or host provider offers, think about which of the server's applications would benefit from Atom notification, and build it in yourself — you might surprise some people.
Links to Files Mentioned in This Article
-
ebaymail2html.pl (renamed)
-
newentryatom.pl (renamed)
- Mime Type
2005-11-23 23:07:21 andoporfe