Menu

Generating SOAP

June 12, 2002

Rich Salz

Introduction

Last month we used the Google web services API to point out some warts in WSDL. This month we'll use the same API to walk through the steps involved in building an application which uses Google.

We'll do the implementation in Python. Python is open source and runs on all the popular platforms. Python is the kind of language that's very well-suited to SOAP and XML processing: it's object-oriented, so you can build large-scale programs; it allows rapid development cycles, and it has powerful text manipulation primitive and libraries, including comprehensive Unicode support. It also provides automatic memory management, good support for introspection (i.e., a program can examine its code and datatypes), and has an active XML community.

We have a couple of choices for the SOAP stack, each choice bringing its own set of features:

  • SOAP.py -- a small, streaming (SAX-based) parser
  • SOAPy -- includes basic WSDL (and Schema) support
  • ZSI -- emphasis on native datatype support; DOM based

We'll use ZSI because of the emphasis it places on typing, and because I wrote it. ZSI is a pure-Python open source SOAP implementation, available at pywebsvcs, a meta-project which serves as an umbrella for several Python web services projects.

Implementing a Google and SOAP Application with Python

Our approach will be to create a local object with fields (Python calls them attributes) which map to the Google Search request. Recall that the message definition from the GoogleSearch.wsdl looks like this:


<message name="doGoogleSearch">

  <part name="key"        type="xsd:string"/>

  <part name="q"          type="xsd:string"/>

  <part name="start"      type="xsd:int"/>

  <part name="maxResults" type="xsd:int"/>

  <part name="filter"     type="xsd:boolean"/>

  <part name="restrict"   type="xsd:string"/>

  <part name="safeSearch" type="xsd:boolean"/>

  <part name="lr"         type="xsd:string"/>

  <part name="ie"         type="xsd:string"/>

  <part name="oe"         type="xsd:string"/>

</message>

The key is a Google-provided authentication token. It serves several purposes, and we'll return to it below. The q is the query string, which is basically the usual URL-encoded query. The search only returns a subset of the results; thus, start and maxResults can be used as a cursor to walk through the results a section at a time. The default is to return the first ten results. The filter, restrict, safeSearch, and lr (language restriction) fields are used to specify whether and how results should be filtered; ie and oe fields specify input and output character set encodings respectively.

Defining a Python object which has a constructor that sets the defaults is fairly straightforward.


##  Pound sign introduces a comment.

##  Blocks are identified by indentation

class Search:

    typecode = tcGoogleSearch('g:doGoogleSearch', typed=0)



    ##  __init__ is the constructor; self is like C++'s this

    def __init__(self, query, key):

        self.key = key

        self.q = query

        self.start = 0

        self.maxResults = 10

        self.filter = 1

        self.restrict = ''

        self.safeSearch = 0

        self.lr = ''

        self.ie = 'latin1'

        self.oe = 'latin1'

Once we have a search object, we'll use ZSI to make a SOAP message and serialize that into a string.




s = Search('rich+salz', 'No,I.am.not.going.to.give.my.key') 

buff = StringIO.StringIO()

sw = ZSI.SoapWriter(buff,  nsdict={'g': 'urn:GoogleSearch'})

sw.serialize(s, oname='doGoogleSearch')

request = buff.getvalue()



Making an HTTP request out of the SOAP message is straightforward. We get the target host and URL from the WSDL service element.


<!-- Endpoint for Google Web APIs -->

<service name="GoogleSearchService">

  <port name="GoogleSearchPort" binding="typens:GoogleSearchBinding">

    <soap:address location="http://api.google.com/search/beta2"/>

  </port>

</service>

The value of the SOAPAction header comes from the definition for the doGoogleSearch operation; while the WSDL file specifies a value, it appears that Google doesn't check. Which is a good thing, since SOAP 1.2 deprecates the use of the SOAPAction header.

Next, we need to put those items together and make an HTTP post. The only nuisance is that we had to create the SOAP message so that we could create a Content-Length header:




import httplib

conn = httplib.HTTPConnection('api.google.com', 80)

conn.connect()

conn.putrequest('POST', '/search/beta2')

conn.putheader('Content-Length', str(len(request)))

conn.putheader('Content-type', 'text/xml; charset="utf-8"')

conn.putheader('SOAPAction', 'urn:GoogleSearchAction')

conn.endheaders()

conn.send(request)



It's not hard to see how almost everything is boilerplate; almost everything can be generated from a single WSDL file, from the local datatypes, up to and out onto the network.

The careful reader may realize that we've glossed over how the serialize function works. Most SOAP toolkits require access to the data definition -- in this case, the XML Schema defined in the WSDL -- in order to generate serialization code. How do we get from the Python Search object to the SOAP message shown in listing 1? While we don't want to get bogged down in the details of a particular SOAP implementation, we'll take a brief look at ZSI's mechanism, in order to get an understanding of some of the issues involved.

SOAP and Serialization

ZSI uses typecodes to describe the data. There are primitives for all the standard XML Schema primitive types, including dates, integers, strings, and so on, as well as constructors to build aggregated types such as complexTypes, which can often map directly into something like a classic C struct.

Let's look at an individual search result, which has the following schema definition:


<xsd:complexType name="GoogleSearchResult">gt;

  <xsd:all>

    <xsd:element name="documentFiltering"           type="xsd:boolean"/>

    <xsd:element name="searchComments"              type="xsd:string"/>

    <xsd:element name="estimatedTotalResultsCount"  type="xsd:int"/>

    <xsd:element name="estimateIsExact"             type="xsd:boolean"/>

    <xsd:element name="resultElements"              type="typens:ResultElementArray"/>

    <xsd:element name="searchQuery"                 type="xsd:string"/>

    <xsd:element name="startIndex"                  type="xsd:int"/>

    <xsd:element name="endIndex"                    type="xsd:int"/>

    <xsd:element name="searchTips"                  type="xsd:string"/>

    <xsd:element name="directoryCategories"         type="typens:DirectoryCategoryArray"/>

    <xsd:element name="searchTime"                  type="xsd:double"/>

  </xsd:all>

</xsd:complexType>

The all says that the sub-elements can be in any order, but, except for directoryCategories, they are all basic primitive types. In ZSI we define a new class, tcSearchResult, which is derived from ZSI's Struct class. Generic is the name of the local class that ZSI will create when it parses a search result message. This class lets you set any class attributes. More complicated uses would likely need special classes which set defaults, enforced additional validity constraints, and so on.


class tcSearchResult(ZSI.TC.Struct):

    def __init__(self, pname=None, **kw):

        ZSI.TC.Struct.__init__(self, Generic,

            [

                ZSI.TC.String('summary', unique=1),

                ZSI.TC.String('URL', unique=1),

                ZSI.TC.String('snippet', unique=1),

                ZSI.TC.String('title', unique=1),

                ZSI.TC.String('cachedSize', unique=1),

                ZSI.TC.Boolean('relatedInformationPresent'),

                ZSI.TC.String('hostName', unique=1),

                tcDirCat('directoryCategory'),

                ZSI.TC.String('directoryTitle', unique=1),

            ],

        pname, inorder=0, **kw)

The pname is used to specify the parameter name, which is basically what name the element will have. As you can see, the bulk of the code is creating a list -- indicated by the square brackets -- which define the items appearing within the search results element. The inorder=0 parameter specifies that the ZSI parser should not require the elements to appear in any specific order, analogous to the XML Schema any element.

But what about those unique=1 parameters? They are additional metadata which tell ZSI that pointer aliasing is not important. As part of its support for local datatypes and legacy RPC systems (DCE/DCOM in particular), SOAP RPC encoding defines mechanisms used to preserve aliased pointers -- those pointing to the same block of memory, as opposed to having the same value.

For example, if p and q are C character pointers, then the following fragments all have different semantics:


/* Different pointers, same value. */

p = strdup("hello");

q = strdup("hello");



/* Different pointers, different value. */

p = strdup("hello");

q = strdup("hangup");



/* Aliased pointers. */

p = strdup("hello");

q = p;

Suppose we now invoke the following subroutine on the different values of p -- what would the value of q be?


void up1(char* s)

{

  s[0] = 'H';

}

Using SOAP RPC encoding, it's possible to preserve this behavior even if up1 is invoked on a remote machine. To do this, you can tag an instance of the data with an XML id attribute, and aliased instances use the href attribute to point to the other instance.

Normally, the values must appear after the proper body of the SOAP message, that is, as succeeding elements in the SOAP body:


<soap-env:Body>

  <tns:pandq>

    <p href="#pval"/>

    <q href="#pval"/>

  </tns:pandq

  <tns:node1 id="pval">

hello

  </tns:node1>

</soap-env:Body>

All of which opens a can of worms known as "serialization roots", which we happily ignore.

By special dispensation, however, strings can be inlined at one of their uses and not inlined other times. This gives us the following more common ways of encoding p and q:


  <tns:pandq>

    <p id="#pval">hello</p>

    <q href="#pval"/>

  </tns:pandq>

As you might expect, Google doesn't care if any of the strings are aliased, if only because they are input parameters. We direct ZSI to avoid the aliasing by indicating that each string is unique. That should probably be the default; if not a bug, it's at least a misfeature.

Having taken a brief tour through all the automation possible with WSDL-defined SOAP RPC messages, next month we'll show how to use SOAP headers to build our own value-added services, moving from "wizards generating code" to "interesting distributed application design."