Amazon's Web Services and XSLT
August 4, 2004
Amazon Web Services (AWS) provide two ways to get XML versions of the information that Amazon's customers ordinarily get from HTML web pages: a SOAP interface and a REST interface. It's nice to see the pains that Amazon takes to make it clear that, when it says "web services" it doesn't just mean SOAP-based web services, but REST too. According to Jeff Barr, Amazon's web services evangelist, 80% of the developers using AWS prefer the REST interface.
The AWS introductory page (see also the FAQ) describes the three steps of using the service: download the free developer's kit, get a developer's token (fill out a form and it gets emailed to you), and then write your application. When using the REST interface, it can be even simpler: get a developer's token, go to the Developer Scratch Pad page, fill out the form showing the parameters of your search, and the page will show you the URL that retrieves the XML version of the information you request.
The page does recommend that you read through the SDK, and an understanding of the parameters being added to the created URL gives you better control over what you retrieve, but the scratch pad is still a great way to jump right in to AWS.
In addition to a developer token, you'll also need an associate ID. When you add this to a REST URL or a traditional link to an Amazon page, if someone follows that link and buys something, you get a commission. For example, in the "Order online from amazon.com" link on my page for my XSLT book, the "/bobducharmeA/" part at the end of the link URL is my associate ID. Feel free to use my associate ID in your URLs; that will give me a commission for sales that you help Amazon make! Otherwise, it's simple to get your own.
Most of the parameters passed in your AWS REST URL specify the details of the search to execute. The f ("format") parameter, which has a default value of "xml", can have another value with no precedent I've seen in other web services: the URL of an XSLT stylesheet to run against your search results. (The stylesheet must be stored on a publicly accessible server so that the AWS XSLT processor can get at it.) This gives you server-side transformation of the XML that you retrieve from Amazon into anything you want.
Which XSLT Processor?
Which XSLT processor is AWS running? I put the following stylesheet at http://www.snee.com/xsl/processorstats.xsl to find out:
<xsl:stylesheet xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <html><body> <p> xsl:version: <xsl:value-of select="system-property('xsl:version')"/> </p> <p>xsl:vendor: <xsl:value-of select="system-property('xsl:vendor')"/> </p> <p>xsl:vendor-url: <xsl:value-of select="system-property('xsl:vendor-url')"/> </p> </body></html> </xsl:template> </xsl:stylesheet>
Upon finding the root of the source tree (any source tree), the stylesheet ignores the source tree's contents and outputs a simple HTML file with three paragraphs. Each contains a call to the XSLT system-property() function (in the XSLT spec, see the Miscellaneous Additional Functions section and scroll down a bit) to retrieve some information about the XSLT processor being used.
When I include the http://www.snee.com/xsl/processorstats.xsl URL in the appropriate place in an AWS REST URL, the use of that URL with a browser displays the following output:
xsl:version: 1 xsl:vendor: Apache Software Foundation xsl:vendor-url: http://xml.apache.org/xalan-c
The use of Xalan C is a sensible choice. It's fast, it's free, and it's open source, in case they need to tune it for their particular installation.
When you develop your own stylesheets for use with AWS, don't use its XSLT processor for development, because the errors you get will be too cryptic. Use the Developer Scratch Pad or build your own URLs to retrieve some sample XML without specifying any stylesheet, save that XML as files on your computer, and then develop and debug your stylesheets using those files as sample input and a locally installed copy of Xalan C, Xalan J, Saxon, or whatever XSLT engine you prefer. Once it's doing what you want, you'll be ready to copy the stylesheet to a web server and reference it from your AWS URLs.
Passing Additional Parameters to your AWS Stylesheet
The XML returned by AWS is a ProductInfo document element whose first child is a Request element that lists the parameters that were passed to it. After I used the following URL (which substitutes a fake developer ID and includes a single carriage return for display purposes) to perform a search with an artist name of "The Velvet Underground,"
the Request element looked like this (without the carriage returns):
<Request> <Args> <Arg value="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113" name="UserAgent"> </Arg> <Arg value="0RENZAAKRCATSAX19QPX" name="RequestID"> </Arg> <Arg value="us" name="locale"></Arg> <Arg value="dev-ID-here" name="dev-t"></Arg> <Arg value="bobducharmeA" name="t"></Arg> <Arg value="The Velvet Underground" name="ArtistSearch"> </Arg> <Arg value="xml" name="f"></Arg> <Arg value="music" name="mode"></Arg> <Arg value="lite" name="type"></Arg> </Args> </Request>
(The "type" value of "lite" indicates that I want XML conforming to the leaner, "lite" version of their DTD/Schema, which contains less information than the "heavy" version.) Most interfaces to anything, upon being passed an unrecognized parameter, either announce an error or ignore the unrecognized parameter. AWS does something much better: it adds unrecognized name/value pairs to the list of name/value pairs in the Request element.
For example, when I add the non-AWS parameter "&flavor=vanilla" to the URL listing Velvet Underground albums, AWS added the following to the Args element in the Request element:
<Arg value="vanilla" name="flavor"></Arg>
Because the AWS server adds the passed value to the XML that your stylesheet acts on, your stylesheet can get and use that value. For example, if I want to have a flavor variable in my stylesheet that is set to a different value each time I run the stylesheet, I could pass the flavor value as described above and then set a flavor variable in the stylesheet to the passed value like this:
<xsl:variable name="flavor"> <xsl:value-of select = "/ProductInfo/Request/Args/Arg[@name='flavor']/@value"/> </xsl:variable>
Putting it All Together
Let's say you're building an application to query Amazon from a wristwatch with a wireless Internet connection. The wireless part is great, but the user interface is so primitive that you're limited to displaying plain text. The following XSLT stylesheet converts the XML returned by a request for lite XML from AWS into plain text.
<!-- awslite2txt.xsl: convert XML returned from Amazon Web Services that conforms to their lite DTD into plain text. If headers=yes add element names as headers. --> <xsl:stylesheet xmlns:xsl = "http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:variable name="headers"> <xsl:value-of select = "/ProductInfo/Request/Args/Arg[@name='headers']/@value"/> </xsl:variable> <xsl:output method="text"/> <xsl:strip-space elements="*"/> <xsl:template match="Details"> <!-- Add a carriage return --> <xsl:text> </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="*"> <xsl:if test="$headers = 'yes'"> <xsl:value-of select="name()"/> <xsl:text>: </xsl:text> </xsl:if> <xsl:apply-templates/> <xsl:text> </xsl:text> </xsl:template> <!-- Just pass along contents.--> <xsl:template match="ProductInfo"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Artist | Author"> <xsl:apply-templates/> <xsl:if test="position() != last()"> <xsl:text>, </xsl:text> </xsl:if> </xsl:template> <!-- Suppress --> <xsl:template match="Request | ImageUrlSmall | ImageUrlMedium | ImageUrlLarge | ListPrice | Asin | UsedPrice | TotalResults | TotalPages | Mode | RelevanceRank "/> </xsl:stylesheet>
The "headers" variable works as described above. If it's set to "yes" the contents of most elements added to the result tree will have their element name added as a prefix. We don't want an XML declaration, so xsl:output sets method to "text," and we're stripping all extra whitespace so that we can add it back in exactly where we want, like the carriage return added by the first template rule before each Details element to create a blank line between each entry.
The second and most complicated template rule first checks whether the $headers variable equals "yes." If so, the template rule adds the element name, a colon, and a space to the result tree before applying any templates to that element's children. It also adds a carriage return after the element value. This template rule's match condition is "*", so it gets applied to any element not covered by another template.
Some of the remaining template rules may look specialized for particular elements, but they really cover categories of elements. For example, if you retrieve some AWS XML and find a foobar element that this stylesheet doesn't handle properly, you'll probably just need to add "| foobar" to the match condition of one of the other template rules to show that foobar falls into that category.
The third template rule adds the ProductInfo element to the result tree with nothing else. I didn't want this element's name showing up before its content even if $headers was equal to "yes" because its child elements have all the names we need for the output. This is another good example of a template rule where you may want to add more element names to the match condition.
The fourth template rule is for elements that may show up in multiples. It prints a comma after each except the last, and a carriage return at the end. This way, if a book has a single author, the name shows up normally, but if there are multiple authors their names show up as a comma-delimited list, as shown on the third line of this output example.
Also in Transforming XML
The last template rule is the simplest, listing the element types to suppress. Remember that suppressing an element suppresses its children as well, because the XSLT engine will never be told to "apply templates" to those children.
Running the Stylesheet
Once you have a developer ID, replace the dev-ID-here part in the following three URLs (and, if you like, the "bobducharmeA" part), remove the carriage returns that I added for display purposes, try them out, and compare their results. The results of the second and third URLs, which use the stylesheet above, are plain text, so if you use the URL to make an AWS call from a web browser, those may each display as one dense paragraph of text in your browser. In those cases, do a View Source to see what was really returned, and imagine an application that uses these URLs to retrieve this text and then passes it along to a device that can only handle plain text such as our hypothetical wristwatch.
Plain XML (sample result here):
XML+stylesheet (sample result here; using a browser to invoke the web service may require a View Source as described above):
XML+stylesheet with "headers" value of "yes" passed (sample result here; using a browser to invoke the web service may require a View Source as described above):
Of course, instead of stripping all markup down to plain text, your stylesheet can also add new markup to make the data more useful to a particular application. Next month we'll see how to convert the AWS XML to RDF, and how the principle of using a few template rules to cover some basic processing categories still holds.