Implementing the Atom Publishing Protocol
by Joe Gregorio
|
Pages: 1, 2, 3
Get an Entry
Let's look at the application that handles a GET on a member of the collection:
def member_get(environ, start_response):
store = getstore(environ)
# 1. Retrieve entry.
body = store.get(environ['selector.vars']['id']).encode('utf-8')
# 2. Send back to client.
headers = [('Content-Type','application/atom+xml;charset=utf-8')]
start_response("200 OK", headers)
return [body]
This is rather simple since Selector pulls the id out of the request URI and places it in the environment as selector.vars. From the id we can retrieve the entry (1) from the Store and send it back to the client (2). Now I've talked in the past about using etags and the If-None-Match: header to speed up requests if a resource hasn't been updated since the last request. We will need to modify our application to calculate an etag, which in this case will just be an MD5 hash of the response body.
def member_get(environ, start_response):
store = getstore(environ)
body = store.get(environ['selector.vars']['id']).encode('utf-8')
etag = md5.new(body).hexdigest() # 1
incoming_etag = environ.get('HTTP_IF_NONE_MATCH', '') # 2
if etag == incoming_etag: # 3
start_response("304 Not Modified", [])
return []
else:
headers = [('Content-Type','application/atom+xml;charset=utf-8'),
('ETag', etag) # 4
]
start_response("200 OK", headers)
return [body]
We calculate the etag (1) for the response and return it via the ETag header (4). If the client has sent an old etag via the If-None-Match: header we get that etag (2) and compare it against the current etag (3) and if they match then we return with a status of 304 Not Modified and an empty response body, otherwise we just return the entry as before. This means that if a client supports etags and the response has not been updated since the last GET then the only data that passed over the wires is the response headers.
In this case we have built the etag handling right into the application to show how easy etag handling can be, but that probably isn't the best way to handle it. A much better approach would be to have our application compute the etag and have some WSGI middleware that wraps our applications that looks for the If-None-Match: header and handles the 304 response.
Etag handling isn't the only way to speed up your responses, the response can also be gzip'd. You have several choices when handling gzip. If you are running under Apache you can turn on mod_deflate and that will handle gzip'ing your content. An alternative is to add some WSGI middleware that handles it for you. Here is our startup code from earlier but with the addition of the gzipper middleware from Python Paste.
from wsgiref.handlers import CGIHandler
import paste.gzipper
s = paste.gzipper.middleware(s, None)
CGIHandler().run(s)
Note that we didn't have to change our applications at all, the functionality is completely orthogonal to the existing applications.
Delete an Entry
Deleting an entry is the mirror of GETting an entry -- we get the id of the entry from Selector's parsing of the request URI and we just pass the delete on down to the Store.
def member_delete(environ, start_response):
store = getstore(environ)
id = environ['selector.vars']['id']
store.delete(id)
start_response("200 OK", [])
return []
Update an Entry
Updating an entry is equally simple in a naive implementation. We read in the sent entry (1) and after determining the id from the URI we put the entry into the store (2) at that location.
def member_update(environ, start_response):
# 1. Read the entry
length = int(environ['CONTENT_LENGTH'])
content = environ['wsgi.input'].read(length)
store = getstore(environ)
id = environ['selector.vars']['id']
# 2. Put the entry in the store.
store.put(id, content)
start_response("200 OK", [])
return []
We can do better. One of the things we would like to protect against is lost updates. For example, two different clients request an entry at the same time (that's not a problem), both clients edit those entries (also not a problem); but then both clients PUT those modified entries back to the server -- now we have a problem! There will be a race condition and one of the client's edits will be lost. HTTP has a minimal set of capabilities that allows a server to detect a conflict and inform the client of that condition. The solution relies on etags, which we already used to optimize our GETs. In this case we rely on the GET to include an etag and then look for that etag in an If-Match header on the PUT request. If the new and old etag match, then we let the PUT proceed; otherwise, we will fail with a status code of 412 Precondition Failed.
def member_update(environ, start_response):
length = int(environ['CONTENT_LENGTH'])
content = environ['wsgi.input'].read(length)
store = getstore(environ)
id = environ['selector.vars']['id']
body = store.get(id).encode('utf-8') # 1
etag = md5.new(body).hexdigest() # 2
incoming_etag = environ.get('HTTP_IF_MATCH', '*') # 3
if (etag == incoming_etag) or ('*' == incoming_etag): # 4
store.put(id, content)
start_response("200 OK", [])
return []
else:
start_response("412 Precondition Failed", []) # 5
return []
We will need to determine the etag for the current entry (1)(2) and then compare that to (3) the etag sent in via the If-Match:header. If the two are equal (4), or if the value of the etag sent is '*', then the PUT request goes through as before. A value of '*' for If-Match: means that the client wishes the request to go through regardless of the current resources etag value, which gives the client a way to forcibly overwrite the server's current value. If the etags don't match (5) we reject the request with a 412 status code.
This code isn't optimal since we do a get() to retrieve the entry to calculate the etag just to check it against the incoming If-Match: header. A faster way would be to calculate and store the etag for each entry instead of recalculating it every time we need it.
There is also bug in this code; a request could come in from another client between the call to store.get() and store.put(). In reality we need to either have Store expose some sort of locking of the database or we need to push the etag functionality down into Store.
This isn't the only way to avoid the lost update problem. Google's GData implementation of the Atom Publishing Protocol gives a unique edit URI to each version of an entry. Every time the entry is updated the edit URI changes. If the client sends a PUT or DELETE to a stale edit URI, then the server returns with a status code of 409 Conflict. There are advantages to both approaches. With the ETag approach the Edit URI never changes, thus allowing local and intermediate caches to work better. In addition, the ETag approach gives a defined mechanism, If-Match: *, to forcibly overwrite an entry. The GData approach has the advantage the even naive clients will be protected from accidental overwrites. The ETag approach requires the client to know about preserving etags that the client sees in GET responses and using them in PUT requests back to the same URI, which is not required of clients of the GData implementation. On the other hand, both systems must be prepared to handle 4xx responses by doing a GET and applying the edits again, so on that account it's wash.
A Cliff Hanger
Next time we will finish looking at the implementations for introspection and enumeration the entries in a collection. That will require introducing a few more tools before we're done. After that we'll dig into the implementation of Store and start building some applications on top of of our APP implementation. Now, you might be asking yourself how we are going to go straight into building applications when I've said nothing about the associated HTML pages for each entry in the collection. In a traditional weblog implementation of the APP, the collection is just an analogue of the web pages that make up the blog, but that doesn't mean those web pages have to exist and our APP service can add plenty of value all on it's own. For a flavor of such a service that can be used, you can read th ACM Queue article "A Conversation with Werner Vogels".
- Leather Sofa Cleaning Los Angeles 1-323-678-2704
2009-06-30 17:34:43 carpetcare - Great areticle
2006-08-08 05:40:09 SylvainH - Great Article!
2006-07-24 02:12:48 StefanTilkov - This ROCKS!!!
2006-07-20 10:38:30 xmlhacker