Dispatching in a REST Protocol Application

August 17, 2005

In my last column I covered how to dispatch based on mime type. That's only part of the dispatching story. The other two major pieces of information to use when dispatching are the HTTP method and the URI. Let's look at the HTTP method first.

The first and easiest way to handle the HTTP method is to handle different methods within the same CGI application. When a request comes in, the HTTP method is passed in via the REQUEST_METHOD environment variable. We can just look up the right handler:

	#!/usr/bin/python
	import os

	method = os.environ.get('REQUEST_METHOD', 'GET')

	print "Content-type: text/plain"
	print ""
	
	if method in ['GET', 'HEAD']:
	    print method.lower()
	elif method == 'POST':
	    print method.lower()

That's not our only choice though, because we are being RESTful: using only a handful of methods with well-defined semantics, we can dispatch based on method at completely different levels.

We can dispatch requests to the same URI to different handlers based on the method. For example, let's pretend we have different CGI applications, one for each of the methods we use: get.cgi, post.cgi, put.cgi, delete.cgi.

And further assume that these CGI applications are located in the directory /myresource/. A GET request to /myresource/ needs to be dispatched to /myresource/get.cgi, and a POST request to /myresource/ needs to be dispatched to /myresource/post.cgi, etc. This is easy to do with Apache's mod_rewrite.

Ok, "easy to do" is a little misleading. Many things are easy in mod_rewrite because it is so powerful. But any powerful module can also be dangerous. Ask me sometime about that recursive mod_rewrite rule I wrote that brought my shared host to its knees.

The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail.

-- Brian Behlendorf, Apache Group

One of the many things that mod_rewrite can do is rewrite a URI based on those same environment variables we used in our CGI application. Here is a .htaccess file that does the above rewriting of URIs based on the request method:

	RewriteEngine On

	RewriteBase /myresource/ 
	RewriteCond %{REQUEST_METHOD}  ^GET$ 
	RewriteRule (^.*$) get.cgi [L] 
	RewriteCond %{REQUEST_METHOD}  ^POST$ 
	RewriteRule (^.*$) post.cgi [L]

Lather, rinse, and repeat for each method that you want to support. Not only can the requests be redirected to other CGI applications on the local server, they can be redirected to a completely different server. That means we could parcel out requests across many servers, distributing the load. Of course, that means that this server is acting as a reverse proxy, for which there's already an Apache module: mod_proxy. But we won't go there right now.

The Simplest Thing that Could Possibly Work

When it comes to dispatching on the URI, we'll start with the simplest thing that can possibly work: a single CGI application that handles all of our requests. As our service grows wildly in popularity we may have to change how we dispatch, but more on that later.

Dispatching in Python

Let's build a simple Python module to help dispatch incoming requests. The design goals for this module are:

Not a framework, just a library.
Provide a simple function robustly.
Allow dispatching based on method and mime type.

The module dispatch.py defines a single class for dispatching: BaseHttpDispatch. To use the module just subclass BaseHttpDispatch and define your own member functions for the types of requests that you want to handle. Dispatching is a matter of instantiating the derived class and calling dispatch() with the requested method and media range. For example, if you wanted to handle POSTs with a mime type of application/xbel+xml and have those requests routed to a member function called POST_xbel(), here is the class you would define:

    class MyHandler(BaseHttpDispatch):
        def __init__(self):
            BaseHttpDispatch.__init__(self,\              
                {'application/xbel+xml':'xbel'})
        def POST_xbel(self):
            pass

Note how the __init__() function of BaseHttpDispatch takes a mapping from application/xbel+xml to xbel. That mapping is used when looking up the function name to call:

   handler = MyHandler()
   handler.dispatch('POST', 'application/xbel+xml')

This will call POST_xbel() member function. You can handle any number of mime types and methods, and even create fallback functions, such as

        def GET(self):
    pass

This will get called if no other GET function with a mime-type specifier matches.

Here is the dispatch.py module:

class BaseHttpDispatch:
    """Dispatch HTTP events based on the method and requested mime-type"""
    def __init__(self, mime_types_supported = {}):
        """mime_types_supported is a dictionary that maps
           supported mime-type names to the shortened names 
           that are used in
           dispatching.
        """
        self.mime_types_supported = mime_types_supported

    def nomatch(self, method, mime_type):
        """This is the default handler called if
           there is no match found. Overload to add
           your own behaviour."""
        return ({"Status": "404 Not Found", "Content-type": "text/plain"},
                StringIO("The requested URL was not found on this server."))

    def exception(self, method, mime_type, exception):
        """This is the default handler called if an
        exception occurs while processing."""
        return ({"Status": "500 Internal Server Error",
                "Content-type": "text/plain"},
                StringIO("The server encountered an unexpected condition\
 which prevented it from fulfilling the request."))

    def _call_fn(self, fun_name, method, mime_type):
        try:
            return getattr(self, fun_name)()
        except Exception, e:
            return self.exception(method, mime_type, e)

    def dispatch(self, method, mime_type):
        """Pass in the method and the mime-type and the best matching
        function will be called. For example, if BaseHttpDispatch is
        constructed with a mime type map
        that maps 'text/xml' to 'xml', then if dispatch is called with
        'POST' and 'text/xml' will first look for 'POST_xml' and
        then if that fails it will try to call 'POST'.

        Each function so defined must return a tuple
        (headers, body)

        where 'headers' is a dictionary of headers for the response
        and 'body' is any object that simulates a file.
        """
        returnValue = ({}, StringIO(""))
        if mime_type and self.mime_types_supported:
            match = mimeparse.best_match(self.mime_types_supported.keys(), 
			   mime_type)
            mime_type_short_name = self.mime_types_supported.get(match , '')
        else:
            mime_type_short_name = ""
        fun_name = method + "_" + mime_type_short_name
        if fun_name in dir(self) and callable(getattr(self, fun_name)):
            returnValue = self._call_fn(fun_name, method, mime_type)
        elif method in dir(self) and callable(getattr(self, method)):
            returnValue = self._call_fn(method, method, mime_type)
        else:
            returnValue = self.nomatch(method, mime_type)
        return returnValue

Example

Let's stub out our bookmark service using dispatch.py. Here are the target URIs we want to handle:

*URIs in the Bookmark Service*
URI	Type of Resource	Description
[user]/bookmark/[id]/	Bookmark	A single bookmark for "user."
[user]/bookmarks/	Bookmark Collection	The 20 most recent bookmarks for "user."
[user]/bookmarks/all/	Bookmark Collection	All the bookmarks for "user."
[user]/bookmarks/tags/[tag]	Bookmark Collection	The 20 most recent bookmarks for "user" that were filed in the category "tag."
[user]/bookmarks/date/[Y]/[M]/	Bookmark Collection	All the bookmarks for "user" that were created in a certain year [Y] or month [M].
[user]/config/	Keyword List	A list of all the "tags" a user has ever used.

We'll assume that there is a single CGI application, bookmark.cgi, that handles all of these URIs. So, for example, our first URI is

  http://example.com/bookmark.cgi/[user]/bookmark/[id]/

We'll subclass BaseHttpDispatch for each of the types of URIs. Here is the class that will handle the Bookmark URI:

class Bookmark(BaseHttpDispatch):
    def __init__(self):
        BaseHttpDispatch.__init__(self, {'application/xbel+xml':'xbel'})
    def GET_xbel(self):
        pass
    def PUT_xbel(self):
        pass
    def DELETE(self):
        pass

This is just a stub, and when we come back to fill out this class we'll replace the stubbed code with code that actually, you know, does something. Here is the class that handles the [user]/bookmarks/ resource:

class Bookmarks(BaseHttpDispatch):
    def __init__(self):
        BaseHttpDispatch.__init__(self, {'application/xbel+xml':'xbel'})
    def GET_xbel(self):
        pass
    def POST_xbel(self):
        pass

You get the idea. The last, missing piece is mapping from URIs into instances of our classes. We can do this by picking the class to instantiate based on the path. Remember at the beginning of this article I said that the path components after the CGI application come in on the PATH_INFO environment variable. Just look at PATH_INFO, figure out which class to instantiate, and then call dispatch() on it. We'll leave that bit of code as an exercise.

Given our simple dispatching class we actually have lots of different ways that we can break up our service. For example, consider this URI from our bookmark service:

   http://example.org/bookmark/[user]/bookmarks/date/[Y]/[M]/

We can implement this in any of the following ways:

   http://example.org/bookmark.cgi/[user]/bookmarks/date/[Y]/[M]/ 
   http://example.org/bookmark/[user]/bookmarks.cgi/date/[Y]/[M]/ 
   http://example.org/bookmark/[user]/bookmarks/date.cgi/[Y]/[M]/

And don't get too hung up on the fact that [user] comes so early in the URI; we can also use mod_rewrite to move the user to the end, or even tack it on the end as a query parameter to the CGI application that ultimately gets called.

Scaling Up

Now our initial implementation was simple and put all the functionality into a single CGI application. Here are some ways we can modify our bookmark service to handle increased load.

Make some content static. Apache is fantastic at serving up static content, so one way to optimize the system would be to keep static versions of frequently requested resources and to route GET requests to those static versions.

   -> (POST,PUT,DELETE) -> /bookmark.cgi
      (GET)             -> /some-static-document-uri

Mod_proxy. Remember I mentioned mod_proxy earlier? That can be used to distribute the requests over a group of machines.

   -> [reverse proxy]  -> server1/bookmark.cgi
                       -> server2/bookmark.cgi
                       -> server3/bookmark.cgi

That might work if each bookmark collection could be updated from any server. If not, then distribute the GETs while keeping all the PUTs, POSTs, and DELETEs to one server.

   -> [reverse proxy]  -> POSTPUTDELserver/bookmark.cgi
                       -> GETserver1/bookmark.cgi
                       -> GETserver2/bookmark.cgi
                       -> GETserver3/bookmark.cgi

These aren't the only ways to handle an increased load. Part III: Advanced Setup and Performance of the mod_perl User Guide is a good starting point for learning the options, and the pros and cons, of each strategy.

Source

The source for dispatch.py is, of course, freely available.