XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

The Impact of Site Finder on Web Services
by Steve Loughran | Pages: 1, 2, 3

What Can a Web Service Developer Do?

Provided all your callers are running well-configured applications, this DNS change will not have any visible effect. It only becomes an issue when the caller has an incorrect URL. In that situation, the change may result in misleading error messages "302", "wrong MIME type", instead of the simpler "No such host".

All you can do is document this in both the end user documentation and the support documentation. There are many other error messages related to connectivity, all of which need to be incorporated into a troubleshooting guide. You need such a guide, whether or not the DNS changes stick. The impact of those changes is that the matrix which maps error messages to underlying causes needs to be updated, with some possible extra causes for messages:

Connection refused

The host exists, nothing is listening for connections on that port.
Site Finder: the URL is using a port other than 80, and the .com or .net address is invalid

Unknown host

The hostname component of the URL is invalid.

404: Not Found

There is a web server there, but nothing at the exact URL. Proxy servers can also generate 404 pages for unknown hosts.

302: Moved

The content at the end of the URL has moved, and the client application does not follow the links.

Site Finder: the .com or .net address is invalid, the port is explicitly -or defaulting to- port 80

Other 3xx response

The content at the end of the URL has moved, and the client application does not follow the links.

Wrong content type/MIME type

The URL may be incorrect, or the server application is not returning XML.
Site Finder: a 302 response is being returned as the host is unknown

XML parser error

This can be caused when the content is not XML, but the client application assumes it is.
Site Finder: this may be the body of a 302 response due to an unknown host, the client application should check return codes and the Content-Type header

500: Internal Error

SOAP uses this as a cue that a SOAPFault has been returned, but it can also mean 'the server is not working through some internal fault'

Connection Timed out/ NoRouteToHost

The hostname can be resolved, but not reached. Either the host is missing (potentially a transient fault), or network/firewall issues are preventing access. The client may need to be configured for its proxy server.

GUI hangs/ long pauses

Client application may be timing out on lookups/connects

The support line's response to such messages should all be the same:

When a connectivity problem is suspected, get the URL that is at fault; the caller to view it in their web browser and see if you can view it yourself.

This is where you can take advantage of the fact that web service protocols are built on top of, or just are, HTTP, and use the common underlying notion of URLs defining services. Provided those same URLs generate some human-readable content, even if that is an XML message, then the end user and support contact can both bring it up in their web browser. This action is the core technique for diagnosing connectivity problems, primarily because the HTTP infrastructure -- servers, proxies and clients -- is designed to support this diagnosis process.

As a web service provider, you can simplify the process by

  • Having human-readable content at every URL used in the service. Specifically, you should support GET requests, even if it is only to return a message such as "There is a SOAP endpoint here".

  • Using URLs that are human readable, short and communicable over the telephone being the ideal.

  • Having support-accessible logging to provide an escalation path should the problem turn out to be server side.

  • Always setting the content type to text/xml or a MIME type specific to the XML returned by the service.

Another useful technique is for the service to implement the ping design pattern. The service needs to support a simple ping operation which immediately returns. This operation can be used by clients to probe for the presence of the service, without any side effects or even placing much load on the server. Client applications should initiate communications with a server -- uploads, complex requests, etc -- by pinging it first. This detects failure early on, often at a lower cost.

What Can the Developer of a Web Service Client Application Do?

Developers of web service client applications are on the front line here. Even if they use a WSDL-based code generation process that hides underlying URLs, or discover services using UDDI, Rendezvous, or some other mechanism, their program will still encounter connectivity problems. Networks are fundamentally unreliable; laptops move around and go offline; services get switched off.

They need to handle the connectivity problems and fail in a way that allows the problem to be diagnosed and corrected.

  1. It is good to translate framework errors/exceptions into error messages that are comprehensible by end users. XML parser errors, HTTP error codes, and complaints about MIME types are not suitable for average end users, though the support organization may need these.

  2. The target URL that failed needs to be disclosed to the end user, so that they can test it by hand.

  3. For any error, the response body needs to be preserved for the benefit of support.

  4. The fault diagnosis matrix listed above needs to be adapted to the client and included in the documentation.

  5. If the service implements a ping operation, use it to probe for service existence, preferably in a background thread or asynchronous call, so that the GUI does not block.

  6. Clients need to be tested over slow and unreliable networks. The Axis tcpmon SOAP monitor/HTTP proxy can be used to simulate slow HTTP connections.

  7. Always verify that the MIME type of received content is exactly that documented.

  8. Test the client's handling of HTTP response codes, and of HTML responses when XML is expected.

  9. Java developers should look at "Address Caching" under java.io.InetAddress. Applications need to be configured to only cache DNS lookups, successful and unsuccessful, for a short period of time.

One question is whether or not to follow 302 requests. While this is ordinarily useful, the new DNS behavior means that it could be troublesome. Follow a 302 and you may end up at Site Finder, trying to parse HTML in an XML parser. This may be a good place to insert Site Finder recognition into the application; redirects to that site can be mapped to an unknown host error; all other redirects can be followed.

Conclusions

The changes that VeriSign made to the .com and .net domains will make it harder to diagnose errors in the URLs used by programs to access web services. However, there was always a chance that an incorrect URL would lead to a confusing error message from the underlying protocol stack. Protocol stacks and client-side applications can be written to handle such errors in a way that makes diagnosis easier; doing so has broader benefits than just addressing the recent DNS changes.

These changes have not helped web services, or any other distributed application protocol. If they return, we are going to have to get used to "Connection refused" and HTTP error code 302 responses as cues for nonexistent hosts. This is going to lead to more support calls, and perhaps some coding to translate these cues into end user messages. Needless to say, VeriSign is not offering to pay for these costs incurred by its actions.

On October 3, ICANN got VeriSign to "temporarily suspend" the Site Finder service, under the threat of legal action from a breach of contract. With any luck, it will stay suspended, though as review boards and lawyers get involved, it will be hard to be sure.

The amount of traffic to the Site Finder site has propelled it to a top ten Internet site, so the kickback from funded links in the search terms is potentially huge. The thought of all that found money will deafen VeriSign's ears to complaints from the developer and networking community. Unfortunately that money probably used to go to AOL, MSN, Earthlink, and Microsoft. I do not see these organizations quietly giving up all this money. As well as the legal path, they have some technical options:

  1. Patch their DNS servers to ignore wildcards on the .net and .com domains.

  2. Patch their DNS servers to forward to ISP-specific search engines

  3. Patch their web browsers or proxies to recognize a Site Finder redirect and redirect it to their own search engines.

VeriSign could do nothing about options (1) and (2). Option three is most easily achieved by the web browser vendor, which means Microsoft. MS could patch IE over the Windows Update mechanism and effectively deny VeriSign 90% of their potential audience. This would destroy VeriSign's justification for the changes and encourage it to revert to an RFC-compliant implementation of DNS. More likely, VeriSign would try and change their redirect URLs to get past the patch, leading to an ongoing patch-war between the effective owners of DNS and the effective owners of the web browser. Anyone hard-coding Site Finder workarounds into their own programs would be victims of such a battle.

If there is one saving grace with web services, it is that users have the option of pasting the target URL into a browser to see what is going wrong. To make this simple, developers of web service protocol stacks and applications need to ensure that the target URL is included in all error reports, and that GET queries of all endpoints return some meaningful information.