XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Google's Gaffe

April 24, 2002

Google's release of an API has been heralded as a bright moment for web services. It is an exciting development, but at the same time there is a subset of the programmer community that is disappointed. Google had a similar XML-based API a year ago, but neither documented nor publicized it. Merely by changing "/search" to "/xml" in any Google query, you could get back an XML representation of the query result. It became a pay-only service last fall, which, while unfortunate, was understandable as the service was a straight money-loser.

Imagine the surprise at Google reviving the service, adding new features, documenting and promoting it. But Google also moved it to an inferior technical platform, SOAP RPC running over HTTP POST. On many mailing lists, weblogs and discussion groups the reaction was mixed. It felt like one step forward and two steps back.

Google seems to be caught up in the hype of SOAP-based web services. Conversely, Amazon has recently released a program for its partners based upon pure XML, HTTP and URIs. eBay has long had an XML/HTTP/URI-style service.

In this article I demonstrate that Google's choice was technologically poor, compared to that of eBay and Amazon. I will show that a Google API based on XML, HTTP and URIs can be simpler to use, more efficient, and more powerful.

Why Google Is Special

In previous articles, I have demonstrated how SOAP-based services can be re-engineered around pure HTTP. Although the uptake of web services has been slow, I cannot suggest reinventions of each and every one of them as an XML/HTTP/URI-style service, but the new Google API is special for a variety of reasons.

  • Google has had an XML/HTTP/URI-style API in the past. The choice to use SOAP would seem to indicate a need to move beyond HTTP. My experiment demonstrates that this is not so.
  • O'Reilly Open Source Convention
    Paul Prescod will give 3 presentations at the upcoming O'Reilly Open Source Convention, this July 22-26, in San Diego.

  • Google's service seems almost unique in how simply and clearly one can define an XML/HTTP/URI interface. The new API is an obfuscating SOAP wrapper over a simple XML/HTTP/URI service.
  • Google the corporation has always been known for its technological acumen and general cluefulness. I think that it can be convinced to do the right thing and restore the HTTP interface (perhaps alongside the SOAP interface).

In addition, I think that the Google move is important symbolically. I take the SOAP-ifying of Google as a sign that the web services hype has now reached overdrive. I regularly hear customers say, "We have a working XML system but we know we'll have to move to SOAP soon." They are going to migrate their working systems to an unproven technology with a questionable design because they feel the dominance of SOAP is inevitable.

Google's SOAP API

Let's take a look first at Google's SOAP API. There are three methods.

doGoogleSearch searches the Google archive and returns an XML document with information about the query (e.g. number of hits) and with a list of result elements.

doSpellingSuggestion requests a spelling correction suggestion. For instance it would suggest that "britney speers" be corrected to "britney spears". It takes only a key and string as parameters and returns only a string for the suggestion.

doGetCachedPage requests a cached page from Google. It takes a key and a URI as parameters and returns base64-encoded data for the cached page.

Below is a complete message from client to server using the SOAP API. It searches for the query string "constantrevolution rules xml":

POST /search/beta2 HTTP/1.1
Host: api.google.com
Accept-Encoding: identity
Content-Length: 914
SOAPAction: urn:GoogleSearchAction
Content-Type: text/xml; charset=utf-8

<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope 
    xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
    xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" 
    xmlns:xsd="http://www.w3.org/1999/XMLSchema">
    <SOAP-ENV:Body>
        <ns1:doGoogleSearch xmlns:ns1="urn:GoogleSearch"
         SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
            <key xsi:type="xsd:string">0000</key>
            <q xsi:type="xsd:string">constantrevolution rules xml</q>
            <start xsi:type="xsd:int">0</start>
            <maxResults xsi:type="xsd:int">0</maxResults>
            <filter xsi:type="xsd:boolean">true</filter>
            <restrict xsi:type="xsd:string"></restrict>
            <safeSearch xsi:type="xsd:boolean">false</safeSearch>
            <lr xsi:type="xsd:string"></lr>
            <ie xsi:type="xsd:string">latin1</ie>
            <oe xsi:type="xsd:string">latin1</oe>
        </ns1:doGoogleSearch>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The message starts with HTTP headers because Google runs SOAP over HTTP. The only SOAP-specific header is the SOAPAction header which is theoretically useful for filtering SOAP traffic. Practically speaking it is optional and thus not a good basis for filtering.

The XML part of the message starts with some namespace declarations and a SOAP encodingStyle declaration, which is more or less boilerplate. The SOAP Body element can also be treated as boilerplate. Inside it is the method name, "doGoogleSearch".

Next there's an ordered list of parameters. The allowed parameters for the doGoogleSearch method are "key" (userid/password), "q" (query text), "start" (where to start returning in the results), "maxResults" (number of allowed results), "filter" (filter out very similar results), "restrict" (country or topic restrictions), "safeSearch" (pornography filter), "lr" (language restrict), "ie" (input encoding) and "oe" (output encoding). None of the parameters are optional.

Comment on this articleDoes SOAP really add anything to Google, or do you agree with the author's sentiments? Share your opinion in our forum.
Post your comments

The "key" parameter is special. It is more or less a password assigned to a particular software developer. The Google API seems not to support SSL so these always pass across the wire in cleartext.

For this particular query I set maxResults to 0 so that I won't actually get any results. Google returns a lot of XML metadata about the query itself so working with query "hits" would make my examples too long.

As you can see, parameters are strongly typed. Although the client and server know the types in advance, most SOAP toolkits will inline the types into the message. As you will see, this is entirely unnecessary.

HTTP-ifying the API

To reinvent this as an HTTP-based method, I merely have to translate the parameters into "query parameters" in a URI. A couple of hundred lines of Python serve to implement the mapping from these query parameters into Google SOAP (for all three methods). Of course one could have done the same thing in ASP, JSP, PHP, etc.

Here is the sort of URI used to address my version of the service:

http://somemachine/cgi/search.py?key=0000&q=constantrevolution+rules+xml&maxResults=0

Nobody (even a programmer) ever needs to look at this URI. It can be autogenerated, just as the SOAP message is. For instance here is a complete Python program to generate the URI and fetch the result:

import urllib

params = {"key":"0000",
          "q":"constantrevolution rules xml",
          "maxResults":"0"}
encoded_params = urllib.urlencode(params)
url = "http://mymachine/cgi/search.py?" + encoded_params
data = urllib.urlopen(url)

That's pretty simple, but I'll show later that by building on a WSDL service description we can make it even simpler.

HTTP deals with optional arguments much more gracefully than most WSDL-based SOAP toolkits, so I've left out any of the arguments that can be omitted. My script infers them just as the older XML/HTTP/URI Google API did.

Let's zoom in on the special key parameter in the URI. Remember that this is a form of authentication. HTTP has a built-in authentication mechanism, so really that would be a better way to handle it. SOAP does not support authentication yet, and authentication is (as of this writing) still an open issue for SOAP's RPC-over-HTTP binding. Of course for ultimate privacy (an an increased performance cost), I would recommend SSL-encrypting the entire message. I will leave key as a parameter for simplicity and to parallel the SOAP version more closely.

Here is the message that gets sent on the wire:

GET /cgi/search.py?maxResults=0&key=0000&q=constantrevolution+rules+xml HTTP/1.0
Host: mymachine
User-agent: Python-urllib

That's it.

Now some might complain that the URI query parameters are just strings, whereas the SOAP parameters were strongly typed. This is not necessarily true. On the wire, of course, the parameters are just strings, just as SOAP messages are just strings. Only at the client and the server are the strings interpreted as other types. The client must know the types of the parameters in order to have made the call. The server must know the types of the parameters in order to have implemented the service. Given that both parties already know the types, it is a waste of bandwidth to declare the types of the parameters in each and every message. Even SOAP does not require it; it's merely a common SOAP idiom.

Which leaves the question of how types are communicated from the service provider to the programmer writing the client. Later I will show a way to declare the types strongly and statically enough to satisfy the most ardent Java or C# masochist.

Pages: 1, 2, 3

Next Pagearrow







close