httplib2: HTTP Persistence and Authentication
by Joe Gregorio | Pages: 1, 2
In the past, people have asked me how to protect their web services and I've told them to just use HTTP authentication, by which I meant either Basic or Digest as defined in RFC 2617.
For most authentication requirements, using Basic alone isn't really an option since it transmits your name and password unencrypted. Yes, it encodes them as base64, but that's not encryption.
The other option is Digest, which tries to protect your password by not transferring it directly, but uses challenges and hashes to let the client prove to the server that it knows a shared secret.
Here's the "executive summary" of HTTP Digest authentication:
- The server rejects an unauthenticated request with a challenge. That challenge contains a nonce, a random string generated by the server.
- The client responds with the same request again, but this time with a
WWW-Authenticate:header that contains a hash of the supplied nonce, the username, the password, the request URI, and the HTTP method.
The problem with Digest is that it suffers from too many options, which are implemented non-uniformly, and not always correctly. For example, there is an option to include the entity body in the calculation of the hash, called
auth-int. There are also two different kinds of hashing,
MD5-sess. The server can return a fresh challenge nonce with every response, or the client can include a monotonically increasing nonce-count value with each request. The server also has the option of returning a digest of its own, which is a way the server can prove to the client that it also knows the shared secret.
With all those options it doesn't seem suprising that there are interop problems. For example, Apache 2.0 does not do
auth-int in Digest. While Python's
urllib2 claims to do
MD5-sess, Apache does not implement it correctly. In addition, looking at the code of Python's
urllib2, it appears to support the
SHA hash in addition to the standard
MD5 hash. The only problem is that there's no mention of
SHA as an option in RFC 2617. And, of course, no mention of Digest is complete without mentioning Internet Explorer, which doesn't calculate the digest correctly for URIs that have query parameters.
Now in case it seems like we're trapped in a twisted Monty Python sketch, there are some bright spots: on Apache 2.0.51 or later you can get IE and Digest to work by using this directive:
BrowserMatch "MSIE" AuthDigestEnableQueryStringHack=On
OK, you know you're in trouble when a directive called
AuthDigestEnableQueryStringHack is the bright spot.
Oh yeah, one last twist in implementing both Basic and Digest is that you should keep track of the URIs that you have authenticated because if you attempt to access a URI "below" an authenticated URI, then you can send authentication on the first request and not wait for a challenge. By "below," I mean based on the URI path. Also, be prepared because the authentication at a lower level in path depth may require a different set of credentials or use a different authentication scheme.
If you move outside of RFC 2617 you could use WSSE, but it isn't really specified for plain HTTP; it doesn't work in any known browsers, it was originally designed for WS-Security and unofficially ported to work in HTTP headers and not in a SOAP envelope; the definitive reference is an XML.com article, and while XML.com is an august publication, it isn't the IETF or W3C.
Now you might think I could use TLS (HTTPS), which is what lots of web apps and services use in conjunction with HTTP Basic. But you should realize that I, like many other people, use a shared hosting account; even if I wanted to shell out the money to buy a certificate, I wouldn't be able to set up TLS for my site, as certificates are tied to a specific IP address and not a domain name. This is really too bad since client-side support for TLS (HTTPS) seems pretty good.
The bad news is that current state of security with HTTP is bad. The best interoperable solution is Basic over HTTPS. The good news is that everyone agrees the situation stinks and there are multiple efforts afoot to fix the problem. Just be warned that security is not a one-size-fits-all game and that the result of all this heat and smoke may be several new authentication schemes, each targeted at a different user community.
For further reading you may want to check out this W3C note from 1999 (!), User Agent Authentication Forms. In addition the WHATWG's Web Applications 1.0 specification lists as a requirement "Better defined user authentication state handling. (Being able to 'log out' of sites reliably, for instance, or being able to integrate the HTTP authentication model into the Web page.)"
As I implemented
3xx redirects I came across a couple things that were new to me, some of which could provide performance boosts. Now, in general, the
3xx series of HTTP status codes are either for redirecting the client to a new location or for indicating that more work needs to be done by the client.
One of the things I learned is that
307 are all cacheable in some circumstances, either by default or in the presence of cache control headers. That means that if your client implements caching, it may avoid one or more round trips if it is able to cache those
At the end of my last article I introduced
httplib2, a Python client library that implemented all the caching covered in that article. So for those of you keeping track at home,
httplib2 also handles many of the things here, such as HTTPS,
Keep-Alive, Basic, Digest, WSSE, and both
compress forms of compression. That's enough of libraries and specs for now; next article, we'll get back to writing code and putting all this infrastructure to work.
- Roadside Assistance Battery Jumpstart Flat Tire Fuel Delivery Los Angeles 1-877-364-5264
2009-06-14 19:23:26 whats
- What is the conclusion?
2006-06-19 23:38:00 sachinsurana