Doing HTTP Caching Right: Introducing httplib2
You need to understand HTTP caching. No, really, you do. I have mentioned repeatedly that you need to choose your HTTP methods carefully when building a web service, in part because you can get the performance benefits of caching with
GET. Well, if you want to get the real advantages of
GET then you need to understand caching and how you can use it effectively to improve the performance of your service.
This article will not explain how to set up caching for your particular web server, nor will it cover the different kinds of caches. If you want that kind of information I recommend Mark Nottingham's excellent tutorial on HTTP caching.
First you need to understand the goals of the HTTP caching model. One objective is to let both the client and server have a say over when to return a cached entry. As you can imagine, allowing both client and server to have input on when a cached entry is to be considered stale is obviously going to introduce some complexity.
The HTTP caching model is based on validators, which are bits of data that a client can use to validate that a cached response is still valid. They are fundamental to the operation of caches since they allow a client or intermediary to query the status of a resource without having to transfer the entire response again: the server returns an entity body only if the validator indicates that the cache has a stale response.
One of the validators for HTTP is the
ETag is like a fingerprint for the bytes in the representation; if a single byte changes the
ETag also changes.
Using validators requires that you already have done a
GET once on a resource. The cache stores the value of the
ETag header if present and then uses the value of that header in later requests to that same URI.
For example, if I send a request to example.org and get back this response:
HTTP/1.1 200 OK Date: Fri, 30 Dec 2005 17:30:56 GMT Server: Apache ETag: "11c415a-8206-243aea40" Accept-Ranges: bytes Content-Length: 33286 Vary: Accept-Encoding,User-Agent Cache-Control: max-age=7200 Expires: Fri, 30 Dec 2005 19:30:56 GMT Content-Type: image/png -- binary data --
Then the next time I do a
GET I can add the validator in. Note that the value of
ETag is placed in the
GET / HTTP/1.1 Host: example.org If-None-Match: "11c415a-8206-243aea40"
If there was no change in the representation then the server returns a
304 Not Modified.
HTTP/1.1 304 Not Modified Date: Fri, 30 Dec 2005 17:32:47 GMT
If there was a change, the new representation is returned with a status code of
200 and a new
HTTP/1.1 200 OK Date: Fri, 30 Dec 2005 17:32:47 GMT Server: Apache ETag: "0192384-9023-1a929893" Accept-Ranges: bytes Content-Length: 33286 Vary: Accept-Encoding,User-Agent Cache-Control: max-age=7200 Expires: Fri, 30 Dec 2005 19:30:56 GMT Content-Type: image/png -- binary data --