Generating SOAP
Last month we used the Google web services API to point out some warts in WSDL. This month we'll use the same API to walk through the steps involved in building an application which uses Google.
We'll do the implementation in Python. Python is open source and runs on all the popular platforms. Python is the kind of language that's very well-suited to SOAP and XML processing: it's object-oriented, so you can build large-scale programs; it allows rapid development cycles, and it has powerful text manipulation primitive and libraries, including comprehensive Unicode support. It also provides automatic memory management, good support for introspection (i.e., a program can examine its code and datatypes), and has an active XML community.
We have a couple of choices for the SOAP stack, each choice bringing its own set of features:
We'll use ZSI because of the emphasis it places on typing, and because I wrote it. ZSI is a pure-Python open source SOAP implementation, available at pywebsvcs, a meta-project which serves as an umbrella for several Python web services projects.
Our approach will be to create a local object with fields (Python calls them
attributes) which map to the Google Search request. Recall that the message
definition from the GoogleSearch.wsdl looks like this:
<message name="doGoogleSearch"> <part name="key" type="xsd:string"/> <part name="q" type="xsd:string"/> <part name="start" type="xsd:int"/> <part name="maxResults" type="xsd:int"/> <part name="filter" type="xsd:boolean"/> <part name="restrict" type="xsd:string"/> <part name="safeSearch" type="xsd:boolean"/> <part name="lr" type="xsd:string"/> <part name="ie" type="xsd:string"/> <part name="oe" type="xsd:string"/> </message>
The key is a Google-provided authentication token. It serves
several purposes, and we'll return to it below. The q is the query
string, which is basically the usual URL-encoded query. The search only returns
a subset of the results; thus, start and maxResults
can be used as a cursor to walk through the results a section at a time. The
default is to return the first ten results. The filter,
restrict, safeSearch, and lr (language
restriction) fields are used to specify whether and how results should be
filtered; ie and oe fields specify input and output
character set encodings respectively.
Defining a Python object which has a constructor that sets the defaults is fairly straightforward.
## Pound sign introduces a comment.
## Blocks are identified by indentation
class Search:
typecode = tcGoogleSearch('g:doGoogleSearch', typed=0)
## __init__ is the constructor; self is like C++'s this
def __init__(self, query, key):
self.key = key
self.q = query
self.start = 0
self.maxResults = 10
self.filter = 1
self.restrict = ''
self.safeSearch = 0
self.lr = ''
self.ie = 'latin1'
self.oe = 'latin1'
Once we have a search object, we'll use ZSI to make a SOAP message and serialize that into a string.
s = Search('rich+salz', 'No,I.am.not.going.to.give.my.key')
buff = StringIO.StringIO()
sw = ZSI.SoapWriter(buff, nsdict={'g': 'urn:GoogleSearch'})
sw.serialize(s, oname='doGoogleSearch')
request = buff.getvalue()
Making an HTTP request out of the SOAP message is straightforward. We get
the target host and URL from the WSDL service element.
<!-- Endpoint for Google Web APIs -->
<service name="GoogleSearchService">
<port name="GoogleSearchPort" binding="typens:GoogleSearchBinding">
<soap:address location="http://api.google.com/search/beta2"/>
</port>
</service>
The value of the SOAPAction header comes from the definition for
the doGoogleSearch operation; while the WSDL file specifies a
value, it appears that Google doesn't check. Which is a good thing, since SOAP
1.2 deprecates the use of the SOAPAction header.
Next, we need to put those items together and make an HTTP post. The only
nuisance is that we had to create the SOAP message so that we could create a
Content-Length header:
import httplib
conn = httplib.HTTPConnection('api.google.com', 80)
conn.connect()
conn.putrequest('POST', '/search/beta2')
conn.putheader('Content-Length', str(len(request)))
conn.putheader('Content-type', 'text/xml; charset="utf-8"')
conn.putheader('SOAPAction', 'urn:GoogleSearchAction')
conn.endheaders()
conn.send(request)
It's not hard to see how almost everything is boilerplate; almost everything can be generated from a single WSDL file, from the local datatypes, up to and out onto the network.
The careful reader may realize that we've glossed over how the
serialize function works. Most SOAP toolkits require access to the
data definition -- in this case, the XML Schema defined in the WSDL -- in order
to generate serialization code. How do we get from the Python
Search object to the SOAP message shown in listing 1? While we don't want to get bogged down in the details of a particular
SOAP implementation, we'll take a brief look at ZSI's mechanism, in order to get
an understanding of some of the issues involved.
ZSI uses typecodes to describe the data. There are primitives
for all the standard XML Schema primitive types, including dates, integers,
strings, and so on, as well as constructors to build aggregated types such as
complexTypes, which can often map directly into something like a
classic C struct.
Let's look at an individual search result, which has the following schema definition:
<xsd:complexType name="GoogleSearchResult">gt;
<xsd:all>
<xsd:element name="documentFiltering" type="xsd:boolean"/>
<xsd:element name="searchComments" type="xsd:string"/>
<xsd:element name="estimatedTotalResultsCount" type="xsd:int"/>
<xsd:element name="estimateIsExact" type="xsd:boolean"/>
<xsd:element name="resultElements" type="typens:ResultElementArray"/>
<xsd:element name="searchQuery" type="xsd:string"/>
<xsd:element name="startIndex" type="xsd:int"/>
<xsd:element name="endIndex" type="xsd:int"/>
<xsd:element name="searchTips" type="xsd:string"/>
<xsd:element name="directoryCategories" type="typens:DirectoryCategoryArray"/>
<xsd:element name="searchTime" type="xsd:double"/>
</xsd:all>
</xsd:complexType>
The all says that the sub-elements can be in any order, but,
except for directoryCategories, they are all basic primitive types.
In ZSI we define a new class, tcSearchResult, which is derived from
ZSI's Struct class. Generic is the name of the local
class that ZSI will create when it parses a search result message. This class
lets you set any class attributes. More complicated uses would likely need
special classes which set defaults, enforced additional validity constraints,
and so on.
class tcSearchResult(ZSI.TC.Struct):
def __init__(self, pname=None, **kw):
ZSI.TC.Struct.__init__(self, Generic,
[
ZSI.TC.String('summary', unique=1),
ZSI.TC.String('URL', unique=1),
ZSI.TC.String('snippet', unique=1),
ZSI.TC.String('title', unique=1),
ZSI.TC.String('cachedSize', unique=1),
ZSI.TC.Boolean('relatedInformationPresent'),
ZSI.TC.String('hostName', unique=1),
tcDirCat('directoryCategory'),
ZSI.TC.String('directoryTitle', unique=1),
],
pname, inorder=0, **kw)
The pname is used to specify the parameter name, which is
basically what name the element will have. As you can see, the bulk of the code
is creating a list -- indicated by the square brackets -- which define the items
appearing within the search results element. The inorder=0
parameter specifies that the ZSI parser should not require the elements to
appear in any specific order, analogous to the XML Schema any
element.
But what about those unique=1 parameters? They are additional
metadata which tell ZSI that pointer aliasing is not important. As
part of its support for local datatypes and legacy RPC systems (DCE/DCOM in
particular), SOAP RPC encoding defines mechanisms used to preserve aliased
pointers -- those pointing to the same block of memory, as opposed to having the
same value.
For example, if p and q are C character pointers,
then the following fragments all have different semantics:
/* Different pointers, same value. */
p = strdup("hello");
q = strdup("hello");
/* Different pointers, different value. */
p = strdup("hello");
q = strdup("hangup");
/* Aliased pointers. */
p = strdup("hello");
q = p;
Suppose we now invoke the following subroutine on the different values of
p -- what would the value of q be?
void up1(char* s)
{
s[0] = 'H';
}
Using SOAP RPC encoding, it's possible to preserve this behavior even if
up1 is invoked on a remote machine. To do this, you can tag an
instance of the data with an XML id attribute, and aliased instances use the
href attribute to point to the other instance.
Normally, the values must appear after the proper body of the SOAP message, that is, as succeeding elements in the SOAP body:
<soap-env:Body>
<tns:pandq>
<p href="#pval"/>
<q href="#pval"/>
</tns:pandq
<tns:node1 id="pval">
hello
</tns:node1>
</soap-env:Body>
All of which opens a can of worms known as "serialization roots", which we happily ignore.
By special dispensation, however, strings can be inlined at one of their uses
and not inlined other times. This gives us the following more common ways of
encoding p and q:
<tns:pandq>
<p id="#pval">hello</p>
<q href="#pval"/>
</tns:pandq>
As you might expect, Google doesn't care if any of the strings are aliased, if only because they are input parameters. We direct ZSI to avoid the aliasing by indicating that each string is unique. That should probably be the default; if not a bug, it's at least a misfeature.
Having taken a brief tour through all the automation possible with WSDL-defined SOAP RPC messages, next month we'll show how to use SOAP headers to build our own value-added services, moving from "wizards generating code" to "interesting distributed application design."
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.