The Impact of Site Finder on Web Services
Automated HTTP Tools
Matt Larson, Review of Technical Issues and VeriSign Response, p25, October 15, 2003
This quotation is from a presentation by VeriSign to ICANN, stating that their recent and temporarily suspended changes to the root DNS servers have had no reported effect on automated HTTP tools; and, further, we shouldn't be automating HTTP access anyway.
Unfortunately, the entire web service protocol stack that IBM, Microsoft, the W3C, Apache, and others have been busy working on for the past few years is effectively "automated processes using HTTP over TCP port 80". Thus the niche processes that are being so glibly inconvenienced by these changes happen to include what many people believe is the future of distributed systems.
This article shows how SOAP-based web service stacks do in fact suffer from VeriSign's changes and discusses what can be done to fix them. The simplest solution is to leave Site Finder turned off. If it comes back, regardless of what changes we make to the SOAP stacks, the process of identifying configuration defects will be made more complex.
A few years ago, when we were bringing up an early web service, we got a support call from the customers: our XML-RPC service was "sending back bad XML". Their client stack, written for an appropriately large fee by some consultancy group, was failing with SAX parser errors. Yet everything was working perfectly on our tests, so the fault had to be somewhere on their side. During the debugging session that ensued, we managed to get hold of the XML content that was causing the trouble. It was the HTML 404 page automatically generated by IIS. This lead to a highly memorable conversation.
"We have found the problem: your client program is receiving an IIS error page and failing to parse it."
"I knew it -- there is a problem on your site."
"We aren't running IIS"
You see, we were running a Java Application server fronted by Apache 1.3. The client-side configuration file was wrong and the client system was pointing at some random server, an IIS server sending back its error page. Their client software was handing this page to an XML parser, with predictable consequences.
I learned a lot from that incident. I learned that a client-side XML parser error is often caused by HTML coming down the wire. I learned that home-rolled web service protocol stacks often neglect to test for HTTP error codes. And I learned that the first thing to do with any problem that you don't see yourself is to figure out which URL you are trying to talk to.
This is a question that everyone building a SOAP, XML-RPC, or REST web service should be prepared to ask more often as a result of the new Site Finder service.
On September 15, 2003, VeriSign tweaked the .com and .net DNS registries so that every lookup for an unknown host resolved to a search service web site, Site Finder, rather than return the NXDOMAIN response traditionally associated with DNS lookup failures.
This led many users' web browsers to the service, which VeriSign hoped would lead to the users clicking through on the paid links in the search service, thus bringing revenue to the company. Unfortunately, Site Finder also happens to break many existing programs: all those that assume a missing hostname maps to an immediate error. These programs will get back a hostname, but when they connect for a conversation, they will get back a "connection refused" error, wrapped into the language and toolkit specific exception, fault, or error code the client program expects. All such programs are now going to have to their documentation rewritten so that people know that a connection refused error may mean the hostname is wrong.
An interesting question is what impact will the changes have on web services -- anything using XML over HTTP as the means of coupling computers. One assumption of VeriSign's is mostly valid: such applications do use HTTP, albeit often on a different port. The other assumption -- that whoever is making the request would be grateful to see a search page -- is clearly false.
Here is what used to happen on a SOAP request to an invalid endpoint hostname, such as http://nosuchhost.com/endpoint:
Caller does DNS lookup.
DNS returns an error.
The protocol stack returns something like java.io.UnknownHostException.
If the application is smart, it maps this to a meaningful error such as that may be an incorrect hostname.
If the application is simple. it shows the framework's error and assumes the end user is smart enough to understand it.
If a person is at the end of the application, they see the error and either fix their endpoint or phone up support.
If it is unattended operation, the machine ought to retry later. Applications aren't meant to cache failed lookups, but Java is naughty: some versions do exactly that unless told not to.
If the host comes back later, all is well. If not, then the application should have a recovery policy.
Now let's look at how things would be expected to change with Site Finder intervening:
Caller does DNS lookup.
DNS returns the IP address of something.
Caller creates a TCP link to a port 80 on that machine, then sends its SOAP request; usually a POST, although SOAP 1.2 adds GET.
The endpoint returns 302, "moved temporarily", redirecting the caller to a URL under http://sitefinder.verisign.com.
If the client handles 302 responses, then it resends the request to Site Finder.
Site Finder returns 200, "OK", and an HTML search page
A SOAP client would normally POST its SOAP request, expecting an XML formatted SOAP response and a 200 code on success, 500 on a fault. Only now it would get a 200 response with text/html content. What is it going to do?
Either it is going to test the MIME type and bail out when that is not XML; or, as in the example cited above, it will hand it off to the XML parser, which will then break as the content is not valid XML. Even if it were valid XHTML, as per the W3C, the parsing would quite probably fail messily when the application tried to make sense of the data.
The result of this is that the VeriSign response does not parse. The client application is going to give some kind of error, perhaps an XML parser error, and that is going to lead to a support call.
The result of the change, therefore, is that if 302 redirects are handled in the web service client, then you are going to get more support calls. What about frameworks that don't? Well, they will report it somehow. Again, it is a more subtle error than Unknown Host, which means support get a call.
Not only is the 302 or search page going to result in meaningless errors, because the responses are only sent after the request is sent, a big request -- such as a POST of binary data or SOAP with Attachments message -- will only fail after the upload. This will waste time and bandwidth. Requests made from a device that pays by the second or by the byte -- such as a cellphone -- will be costing the user even more money than before.