XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Catching Up with the Atom Publishing Protocol
by Joe Gregorio | Pages: 1, 2, 3

Custom Format

Actually many different custom XML formats have been proposed. Here is a sample of the format defined in the current draft.

<?xml version="1.0" encoding='utf-8'?>
<service xmlns="http://purl.org/atom/app#">
  <workspace title="Main Site" > 
    <collection title="My Blog Entries" 
      href="http://example.org/reilly/main" >
      <member-type>entry</member-type>
      ...
    </collection>
    <collection title="Pictures" 
      href="http://example.org/reilly/pic" >
      <member-type>media</member-type>
      ...
    </collection>
  </workspace>
  <workspace title="Side Bar Blog">
    ... 
  </workspace>
</service>
XOXO

One of the older microformats, XOXO has been proposed as a format for listing the collections and workspaces. Here is a fragment from the XOXO microformat draft specification:

<ol class='xoxo'>
  <li>item 1
    <dl>
      <dt>description</dt>
        <dd>This item represents the main point we're trying to make.</dd>
    </dl>
    <ol>
      <li>subpoint a</li>
      <li>subpoint b</li>
    </ol>
  </li>

You can see how it would be easy to describe groupings of collections and workspaces using such a format.

Atom Syndication Format

Atom feeds have been proposed in several forms, including one that uses one entry per collection, and others that use link elements to point to collections, workspaces, and other Atom feeds.

Enumerating

In the table above I said that doing a GET on a collection returns an Atom feed with what might be a subset of all the entries in the collection. So how do I go about listing the rest of the entries? Good question; apparently a painful question too. There have been proposals that have spanned almost every possible combination and permutation.

link/@rel="next". Since we are using an Atom feed to enumerate the entries in a collection we can use the link element with a rel-attribute value of "next" to point to the next n entries. Do that recursively so that you end up with a chain of Atom feeds all linked together going back in time. This has several advantages; the server can always give you a reasonable amount of data, the client can use ETags on the first feed in the chain to see if there have been any updates. Of all the solutions, this is the only one that can be served statically. Finally, this is how enumerating entries worked in gregorio-09. Currently it appears that this method of enumerating members will end up in the core.

The disadvantages are that it gives the client very little control and the client may have to do many GETs following the chain of feeds if it wants to enumerate every entry in a collection.

The remaining proposals used either indexes or dates or some combination. The index approaches, where the collection is treated like a giant array and you pass in values to select a subset, have some drawbacks. The first is that you really need the total size of the collection, but realize that that number may change between the time you read it and the time you actually use it. The second drawback is that since the number of entries returned is determined by the client, it is possible for the client to accidentally, or maliciously, request a huge number of entries. Not really a problem for a weblog which would have its APP service protected by authentication, but think about an APP-enabled wiki which may not be protected by authentication.

URI Templating. URI templating is part of the current draft but I believe it will be departing soon. This is a system where a string is given with brace-delimited keywords embedded in it. Those braces and keywords get replaced with values and that constructs a URI. For example, if the keyword was index and it accepted a range of index values separated by a dash, then this URI template:

http://example.org/find/{index}

could be filled in like so:

http://example.org/find/0-14

to request the first 15 entries in the collection. This is a mechanism I described and provided code for in a previous column, Constructing or Traversing URIs.

A variety of proposals used different sets of keywords that did index- or date-based queries. All of these have the disadvantages I stated previously for index- and data-based queries. In addition this approach has the disadvantage that the client has to get the URI template from somewhere before it starts working with a collection.

SEARCH. Yes, someone actually proposed extending HTTP with a new method for searching. Again, all the disadvantages of a date- or index-based approach, with the additional obstacles presented by trying to create a new HTTP.

GET with Range. HTTP already supports the idea of a partial GET where you use a Range header, but that is only specified in terms of bytes. There was a proposal for doing this using a new unit for Range that was in updated. This was actually in the specification for a while back in draft-ietf-atompub-protocol-03.

More from
The Restful Web

Implementing the Atom Publishing Protocol

httplib2: HTTP Persistence and Authentication

Doing HTTP Caching Right: Introducing httplib2

Dreaming of an Atom Store: A Database for the Web

Dispatching in a REST Protocol Application

This has all the disadvantages of a date-based query with the added problem of its opacity. In order to find out if a collection supports the new Range unit you have to do a GET on the collection and look in the headers for an Accept-Ranges header with the right value.

"Just" Use Query Parameters. There was even a proposal to "just" use query parameters. While simple to implement and specify, this has a pretty major flaw in that it prevents the server from using query parameters for anything else. For example, if you had multiple collections that were all accessed through the same CGI application, and you passed in the collection you wanted via query parameter, that would have been impossible with this approach. The more complex URI templates were invented in part to solve this problem but their complexity was in part responsible for them being rejected.

Summary

While the WG has certainly worked with a great many proposals and alternatives I believe the lessons are pretty clear: when in doubt, go for the simplest thing that could possibly work and trust in the utility of hypertext. A simple format sprinkled with links contains a surprising amount of power.