XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

The Impact of Site Finder on Web Services
by Steve Loughran | Pages: 1, 2, 3

Testing on .NET WSE2.0

What does .NET1.1 with the preview release WSE2.0 do? I chose this stack as it is the latest version of one of the leading SOAP stacks, and I had a client program that I had written with it ready to hand.

As this is the most recent SOAP implementation from Microsoft, one would expect it to have incorporated all the feedback from users of the previous implementations, and handle errors gracefully and in a way that could be well reported. It certainly does this with the classic failure mode but not with the VeriSign introduced errors.

Before:

C:> DotNetClient doc.xml http://nosuchhost.com/axis/endpoint
uploading doc.xml to http://nosuchhost.com/axis/endpoint
Exception:
System.Net.WebException: The underlying connection was closed: 
 The remote name could not be resolved.
   at System.Net.HttpWebRequest.CheckFinalStatus()
   at System.Net.HttpWebRequest.EndGetRequestStream(IAsyncResult asyncResult)
   at System.Net.HttpWebRequest.GetRequestStream()
   at Microsoft.Web.Services.SoapWebRequest.GetRequestStream()
   at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(
      String methodName, Object[] parameters)

This is what we expect: an error message that indicates the true, underlying cause of the problem.

After:

With Site Finder running, this stack bails out at the end of the first POST with a MIME type error:

C:> DotNetClient doc.xml http://nosuchhost.com/axis/endpoint
uploading doc.xml to http://nosuchhost.com/axis/endpoint
Exception:
System.InvalidOperationException: Client found response 
content type of 'text/html; charset=iso-8859-1', but expected 'text/xml'.
The request failed with the error message:
--
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved 
<A HREF="http://sitefinder.verisign.com/lpc
    ?url=nosuchhost.comPOST%20/axis/endpoint&amp;host=nosuchhost.com">
    here</A>.<P>
</BODY></HTML>
   at System.Web.Services.Protocols.SoapHttpClientProtocol
         .ReadResponse(SoapClientMessage message, WebResponse response, 
                 Stream responseStream, Boolean asyncCall)
   at System.Web.Services.Protocols.SoapHttpClientProtocol
         .Invoke(String methodName, Object[] parameters)

The stack is trying to parse the body of the 302 response, instead of looking at the response and failing on that error code.

Provided the client application presents all the data in the exception, whoever ends up fielding the support call will be able to diagnose the problem. Assuming, that is, that they know that a redirect to Site Finder appears whenever the client application tried to connect to port 80 on an unknown host. If the exception text was not displayed, only its type (System.InvalidOperationException), then there would be not enough information to diagnose a cause.

Java: Apache Axis

On the Java-side, I am going to look at Apache Axis. The trace here is from the CVS_HEAD version of Axis from September 27 2003.

Before:

A classic DNS failure results in a Java UnknownHostException being thrown and then wrapped in the generic AxisFault Exception, which adds SOAP1.1/1.2 attributes such as actor, node and detail:

AxisFault
 faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
 faultSubcode: 
 faultString: java.net.UnknownHostException: nosuchhost.com
 faultActor: 
 faultNode: 
 faultDetail: 

After:

With Site Finder operational, the error becomes more complex. The core text of the fault is the response from the server; the fault detail incorporates the text of the response.

     
AxisFault
 faultCode: {http://xml.apache.org/axis/}HTTP
 faultSubcode: 
 faultString: (302)Found
 faultActor: 
 faultNode: 
 faultDetail: 
        {}:return code:  302
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved 
<A  HREF="http://sitefinder.verisign.com/lpc?url=nosuchhost.comPOST%20/axis/&amp;
    host=nosuchhost.com">here</A>.<P>
</BODY></HTML> 
(302)Found
at org.apache.axis.transport.http.HTTPSender.readFromSocket(HTTPSender.java:630)
at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:128)
at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:71)

So it's the same thing: the client stack reads the 302-coded redirect page and bails out. Rather than choke on the text/html content, Axis rejects the response because the code is not supported. Supporting redirects is actually something we have discussed in the past -- I don't think we want to do that any more.

As with .NET WSE.2.0, the result is that a misspelled endpoint could result in different errors than before. Whereas an unknown host message was probably going to spur a couple of the end user's neurons into inferring a cause, a 302 may not. Indeed, I have some suspicions based on a fair few of the postings on the axis-user mailing list that a fair few of the people writing web services do not themselves know what a connection refused error message implies, let alone a 302 response code. If the people writing web services do not understand error codes from layers further down the stack, I do not have high hopes for end users.

Other Implications

Here are some other unexpected consequences of the change that are hard to describe as positive for web services.

  • Anything retrieving WSDL will also have to deal with the 302 redirect. Again, Axis will probably fail with some moderately uninformative error.

  • XML processors need to resolve hostnames to import remote DTDs and schemas. Hopefully nobody has been using invalid domains for their URIs.

  • Unless you properly configure the Java runtime, Java applications, including application servers, cache successful DNS lookups forever. If a hostname resolves to the VeriSign spoof service once, it will resolve there until the application is restarted. Java 1.3 and earlier cached negative responses, in violation of the DNS standards, and were roundly vilified for the practice. The VeriSign change reverts Java to the old behavior in some instances.

What Do the Standards Say?

The most relevant specification here, RFC2616, HTTP/1.1, says of 302 and 307 redirects

If the [302 or 307] status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

Of course, this document assumes a user behind the User Agent, the latter being a web browser of some sort. Web service clients may or may not have an end user to hand; with many more POSTs, PUTs, and other operations being sent to the destination, a very different interaction model from traditional web browsing. It is unrealistic to ask the user what to do after every redirection response is received.

Alongside the official W3C specifications and submissions, the main body dealing with defining what is a good SOAP-based web service is the WS-I. VeriSign is a member of this organization and has contributed a lot to the security aspects of the SOAP-based web services specification suite.

The WS-I Basic Profile 1.0 says implementations may use 307 as the redirect code, not the 302 code. So the fact that both .NET and Axis ignore the 302 is correct. At the same time, neither respond very well to the message. Axis should have a user comprehensible message; .NET WSE2.0 should look at the response code before complaining that the response was in HTML.

The final reference document is the one that VeriSign refers to in its "you should not be automating HTTP" slide. Best Current Practices #56 is a declaration of what constitutes good and bad behavior for anything attempting to layer itself over HTTP. It states that port 80 should not be used for new protocols, no error codes other than 200 and 500 should be recognized, and that new URI schemas may be appropriate to identify new protocols. These recommendations are certainly valid for protocols such as RMI-over-HTTP and DCOM-over-HTTP, which use HTTP purely as means to break through the firewall with their distributed object protocols. Some people view SOAP as a similar abuse of HTTP, though SOAP 1.2, with its GET support, integrates better with classic HTTP use than ever before.

In the web services I have developed, we have often mixed user visible code with the SOAP services within the same Java app. It is the only way to maintain context within a single Java Web Application, without having to resort to back-end services such as a database or EJB server.

Furthermore, BCP 56 only covers layering of protocols above HTTP, not automated use of those protocols with machine-readable content. The REST paradigm is pure HTTP, making full use of the verb set and (typically) passing XML data in both directions. If some of the state of the remote objects includes HTTP documents, then the integration of REST with the rest of the Web is complete.

Finally, WebDAV is an HTTP extension that is designed to treat a web server a read-write file repository. While it can be used for a pure remote filesystem, its core role is to give HTTP editing tools the ability to upload content to a public site. In this role, it is only useful if that public site contains human readable content, such as HTML pages served up from the default port of the HTTP protocol. BCP56 cannot apply to such as use case.

I must conclude that, while BCP 56 does contain valid recommendations, they do not preclude a HTTP server on port 80 supporting automated clients, be they SOAP, REST, WebDAV or something else. Saying that unusual failure modes introduced by Site Finder are our fault for ignoring BCP 56 is a specious argument.

Framework Changes

What can be done in SOAP, XML-RPC and REST frameworks?

  1. Reject 302 redirects with meaningful errors.

  2. Verify MIME types before parsing the contents, again reporting errors in a way that enables the problem to be resolved.

  3. Recognize a Site Finder redirect and translate that into a no-such-host error.

  4. Hope that everyone patches their DNS servers to ignore the wildcards.

  5. Always include the endpoint URL in any connectivity/parser fault; anything where the destination did not reply within the schema expected.

Hard-coded handling is an ugly hack that is hard to countenance. It is also very brittle. This leaves better reporting of connectivity faults to end users as the most fundamental improvement.

As of October 1, Axis saves the HTTP error code in as one of the elements in the fault details (http://xml.apache.org/axis/ : HttpErrorCode). We also plan to save the target URL and the headers from HTTP responses. This will help Axis users, but does little for anyone coding a client intended to work with any implementation of the JAX-RPC specification.

One other area for improvement in web service stacks is simply to test against HTTP response codes, and make sure the error messages are meaningful. Even the mainstream toolkits, Axis and MS WSE, are clearly weak here; home-rolled implementations are likely to be as bad or even worse.

Pages: 1, 2, 3

Next Pagearrow