Amazon's Simple Queue Service
January 5, 2005
Amazon has recently introduced a new Simple Queue Service. Well, they haven't introduced it yet; it is still in Beta. So take this article with a grain of salt as the final service provided by Amazon may differ from what I describe here. The reason I am so interested in the Amazon service is that it's billed as having both SOAP and REST interfaces. Unfortunately, it's not living up to its billing. Let's learn how to use their service, and in addition, figure out how far they are from RESTful.
What Does the Amazon Simple Queue Service Do?
To quote from the Amazon site directly: “The Amazon Simple Queue Service offers a reliable, highly scalable hosted queue for buffering messages between distributed application components.” The service offers seven operations that you can perform:
- CreateQueue: Create queues for your own use or to share with others.
- ConfigureQueue: Modify the properties of an existing queue.
- ListMyQueues: List your existing queues.
- DeleteQueue: Delete one of your queues.
- Enqueue: Add any data entries up to 4 KB in size to a specified queue.
- Read: Return data from a specified queue. No data-key is required, and data is returned in roughly the same order it was added to the queue.
- Dequeue: Remove a specified piece of data from a specified queue.
One of the things I frequently criticize SOAP and XML-RPC for is doing everything through POST. Of the many problems with doing everything through POST, the biggest is that there are things that could be better done through GET. Well, the Amazon Queue Service does everything through GET, and no, that's not a good thing. Let's take a look.
All of the seven operations are GET requests of the form:
http://webservices.amazon.com/onca/xml?Service=AWSSimpleQueueService &SubscriptionId=[Your Subscription ID Here] &Operation=[Operation] &[Param1]=[Param1 Value] &[Param2]=[Param2 Value] ... &[ParamN]=[Param3 Value]
where [Operation] is one of:
There are several fundamental problems with this approach. The first is the inclusion of the Subscription ID in the URI. This appears to be a known weakness to Amazon as the documentation contains this note regarding the Subscription ID:
During the Beta period, there are limited security restrictions on access to individual queues. Access is granted to another user or application if they can provide your AWS Subscription ID and the corresponding identifier of your queue.
So we won't dwell on that problem, and we'll move on to the other flaw, the use of GET for all the operations. There are some operations that should use GET. ListMyQueues and Read are both perfectly good GET candidates as they only return information.
The problem with the other five operations is that GET is supposed to be both “safe” and “idempotent.” The terms “safe” and “idempotent” have particular meanings for HTTP. “Safe” means that GET does not have the significance of taking an action other than retrieval. “Idempotence” means that the side-effects of N identical requests are the same as for a single request. Deleting a queue is a little more significant than mere retrieval, and so is configuring a queue. As a matter of fact, all of the operations beside ListMyQueues and Read are more significant than mere retrieval. The violations of idempotency are even worse; the side effect of N identical Enqueue requests would result in N entries being added to the queue.
The universal use of GET is not the only problem with the Amazon Simple Queue Service. The service ignores HTTP status codes and always returns a 200. Instead, the real status is tunneled in the response body itself.
A further problem is with media types. All of the responses from the service return XML. The good news is that that XML is in a namespace. The bad news is that Amazon hasn't registered a new media type, or types, for all the responses and instead is using the 'text/xml' media type. Mark Pilgrim has highlighted some of the perils of using the 'text/xml' media type and RFC 3023.
The real core of the problem with the Simple Queue Services is that it is still just a simple RPC (Remote Procedure Call)interface. The method used is GET instead of POST, and the procedure name is now passed as part of the query parameters and not in an XML request body, but those are only cosmetic differences. This is an RPC protocol trying to run over HTTP, and that misalignment is where the problems arise.
A More RESTful Reformulation
In order to fully utilize HTTP, you need to switch the thinking from an RPC approach to one of 'resources' and 'representations'. For the sake of exposition, let's redesign Amazon's Queue Service to be more RESTful. The first thing is to drop the authentication from the URIs and use either Basic or Digest authentication. Since the Amazon Simple Queue Service supports HTTPS, using just Basic over HTTPS would be fine.
The first step in reformulating the Amazon service is to consider all the resources in the system. Here's a graphic representation of most of the resources in the Amazon Queue Service:
Block diagram of some of the resources in the Queue Service.
From that graphic, we can see the following resources:
- Queue Collection
- A collection of queues
- A somewhat ordered collection of queue entries.
- Queue Entry
- An entry in a queue. (The boxes labelled 1, 2, 3, etc.)
- Queue Subset
- A subset representing the first N entries in a queue at a certain point in time.
- Queue Configuration
- The configuration for each queue. (This one isn't represented in the graphic.)
Now, with our resources in hand and recalling the basic four methods of HTTP; GET, PUT, POST, and DELETE, let's see how we can recreate the seven operations of the Amazon Queue Services. From just the above resources, we can map out how to handle the seven operations:
- The Queue Configuration resource holds the configuration of a queue. To update such a resource, we can PUT the configuration information to the Queue Configuration resource.
- The queue resource represents a collection of queue entries. Since POST has "append" as one of its meanings, we can POST the name of the queue to be created to the Queue Collection to create a new queue.
- Each queue is a resource, and so removing a queue is as simple as calling DELETE on the queue resource.
- GET on the Queue Collection resource should return a list of queues.
- The Read operation doesn't follow a standard queue pop operation since it allows the client to retrieve several entries from queue with a single request. This could be constructed as a query interface which reads the first N entries on the queue. The representation returned contains not only the entries themselves but the URI of the resource that represents those first N entries on the queue, a Queue Subset resource. That Queue Subset resource returned can be used in subsequent Dequeue and Enqueue operations.
- DELETE the appropriate Queue Subset resource.
- Again, a queue is just a collection, and POST has as one of its meanings "append," so we can POST the entries to the queue resource to append them to the queue.
Note that this is just one way of approaching this problem set, and there are other possibilities. The reason I chose the above breakdown is that it kept the same number of HTTP requests as the original service, just so no one would complain that REST implies more round trips than RPC. There are other ways to refactor the service; for example, each entry in a queue can be given its own URI, and to Dequeue an entry, you would call DELETE on the Queue Entry resource. Another modification would be to drop the query interface for Read and to allow only popping a single item from the head of the queue. Yet another modification would be to have the queue and Queue Configuration resource be the same but to differentiate them based on content type. I am not a big fan of content negotiation in web services and mention that last option only for completeness and not because I think it is a good idea.
Now we've skimped on some of the details, for example, the formats of the requests, the formats of the responses, and the possible HTTP status codes, but a high-level view gives us enough information to compare and contrast the two approaches. The advantages of our RESTful reformulation of the Amazon Simple Queue Services are:
- It has the same number of HTTP calls as the Amazon service for all the operations listed.
- There are no worries about idempotence or safety.
- The GETs that are really GETs can now be optimized, that is, the use of etags, gzip, and the caching capabilities of HTTP can now be exploited to improve performance.
- We now have a basic model of the Queue services with obvious points of extensibility. For example, the Amazon Simple Queue Services has no way to 'read' the Queue Configuration, only a way to set it. For Amazon to extend their service, they would have to add a new operation. From our model, you can easily see that that would be simply supporting a GET on the Queue Configuration resource.
- Security is enhanced by moving authentication out of the URI.
When putting your service together, remember to avoid getting locked into an RPC mind set. Do that by starting with the resources and their representations, and you'll unlock all the advantages of HTTP.