Doing HTTP Caching Right: Introducing httplib2
by Joe Gregorio
|
Pages: 1, 2, 3, 4
Vary
You may be wondering about situations where a cache might get confused. For example, what if a server does content negotiation, where different representations can be returned from the same URI? For cases like this HTTP supplies the Vary: header. The Vary: header informs the cache of the names of the all headers that might cause a resources representation to change.
For example, if a server did do content negotiation then the Content-Type: header would be different for the different types of responses, depending on the type of content negotiated. In that case the server can add a Vary: accept header, which causes the cache to consider the Accept: header when caching responses from that URI.
Date: Mon, 23 Jan 2006 15:37:34 GMT
Server: Apache
Accept-Ranges: bytes
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Cache-Control: max-age=7200
Expires: Mon, 23 Jan 2006 17:37:34 GMT
Content-Length: 5073
Content-Type: text/html; charset=utf-8
In this example the server is stating that responses can be cached for two hours, but that responses may vary based on the Accept-Encoding and User-Agent headers.
Connection
When a server successfully validates a cached response, using for example the If-None-Match: header, then the server returns a status code of 304 Not Modified. So nothing much happens on a 304 Not Modified response, right? Well, not exactly. In fact, the server can send updated headers for the entity that have to be updated in the cache. The server can also send along a Connection: header that says which headers shouldn't be updated.
Some headers are by default excluded from list of headers to update. These are called hop-by-hop headers and they are: Connection, Keep-Alive, Proxy-Authenticate, Proxy-Authorization, TE, Trailers, Transfer-Encoding, and Upgrade. All other headers are considered end-to-end headers.
HTTP/1.1 304 Not Modified
Content-Length: 647
Server: Apache
Connection: close
Date: Mon, 23 Jan 2006 16:10:52 GMT
Content-Type: text/html; charset=iso-8859-1
...
In the above example Date: is not a hop-by-hop header nor is it listed in the Connection: header, so the cache has to update the value of Date: in the cache.
If Only It Were That Easy
While a little complex, the above is at least conceptually nice. Of course, one of the problems is that we have to be able to work with HTTP 1.0 servers and caches which use a different set of headers, all time-based, to do caching and out of necessity those are brought forward into HTTP 1.1.
The older cache control model from HTTP 1.0 is based solely on time. The Last-Modified cache validator is just that, the last time that the resource was modified. The cache uses the Date:, Expires:, Last-Modified:, and If-Modified-Since: headers to detect changes in a resource.
If you are developing a client you should always use both validators if present; you never know when an HTTP 1.0 cache will pop up between you and a server. HTTP 1.1 was published seven years ago so you'd think that at this late date most things would be updated. This is the protocol equivalent of wearing a belt and suspenders.
Now that you understand caching you may be wondering if the client library in your favorite language even supports caching. I know the answer for Python, and sadly that answer is currently no. It pains me that my favorite language doesn't have one of the best HTTP client implementations around. That needs to change.