XML.com 
 Published on XML.com http://www.xml.com/pub/a/2003/10/15/dive.html
See this if you're having trouble printing code examples

 

The Atom API
By Mark Pilgrim
October 15, 2003

Atom is an up-and-coming format for editing, syndicating, and archiving weblogs and other episodic web sites. The final details are still being hashed out, but that's never stopped me before, having written several articles about XHTML 2. To understand the problems that Atom is designed to solve, we should look briefly at what came before it.

The LiveJournal API

In the beginning, there was LiveJournal. LiveJournal had an API, the LiveJournal Client/Server API. It worked over HTTP, and it looked like this:

POST /interface/flat HTTP/1.1
Host: www.livejournal.com
Content-type: application/x-www-form-urlencoded

mode=login&user=test&password=test
HTTP/1.1 200 OK
Content-Type: text/plain

name
Mr. Test Account
success
OK
message
Hello Test Account!

A function call is an HTTP POST to a specific URL, which is the same for all function calls. The function name is given as a parameter in a list of form-encoded key-value pairs. The result is also a list of key-value pairs which are separated by carriage returns instead of ampersands.

Things to notice right off the bat:

The Blogger API

Next in our abbreviated history is the Blogger API. The Blogger API was created by Evan Williams of Pyra Labs and was quickly adopted by virtually everyone. It defined a series of functions, such as blogger.newPost, which took as arguments application_key (application-specific, each developer signed up to receive one), blog_id (defined within the Blogger system), username, password, entry_text, and a boolean flag publish which controlled whether to publish the new post immediately or leave it in draft mode.

The Blogger API was based on XML-RPC, so a call to newPost(APP_KEY, BLOG_ID, USERNAME, PASSWORD, ENTRY_TEXT, PUBLISH) would send this over the wire:

POST /api/RPC2 HTTP/1.1
Host: plant.blogger.com
Content-Type: text/xml

<?xml version='1.0'?>
<methodCall>
  <methodName>blogger.newPost</methodName>
  <params>
    <param>
      <value>
        <string>APP_KEY</string>
      </value>
    </param>
    <param>
      <value>
        <string>BLOG_ID</string>
      </value>
    </param>
    <param>
      <value>
        <string>USERNAME</string>
      </value>
    </param>
    <param>
      <value>
        <string>PASSWORD</string>
      </value>
    </param>
    <param>
      <value>
        <string>ENTRY_TEXT</string>
      </value>
    </param>
    <param>
      <value>
        <boolean>PUBLISH</boolean>
      </value>
    </param>
  <params>
</methodCall>

Things to note here:

The MetaWeblog API

In direct response to the perceived limitations of the Blogger API, especially the lack of extensibility, since many people wanted titles, UserLand created the MetaWeblog API. It solved some of the problems but at the cost of added complexity. It was also based on XML-RPC, but it replaced the single entry_text string argument with a struct which could hold multiple pieces of information.

As the MetaWeblog API spec puts it,

The MetaWeblog API uses an XML-RPC struct to represent a weblog post. Rather than invent a new vocabulary for the metadata of a weblog post, we use the vocabulary for an item in RSS 2.0. So you can refer to a post's title, link and description; or its author, comments, enclosure, guid, etc. using the already-familiar names given to those elements in RSS 2.0.

In other words, if you want to post a new entry that would be represented like this in RSS,

<item>
  <title>My Weblog Entry</title>
  <description>This is my first post to my weblog.</description>
  <pubDate>Mon, 13 Oct 2003 13:29:54 GMT</pubDate>
  <author>Mark Pilgrim (f8dy@example.com)</author>
  <category>Unfiled</category>

you would create that entry with a struct, something like this:

>>> import xmlrpclib
>>> server = xmlrpclib.ServerProxy('http://www.example.com/RPC2')
>>> server.metaWeblog.newPost(BLOG_ID, USERNAME, PASSWORD,
  {'title': 'My Weblog Entry',
   'description': 'This is my first post to my weblog.',
   'dateCreated': '2003-10-13T13:29:54',
   'author': 'Mark Pilgrim (f8dy@example.com)',
   'category': 'Unfiled'},
   xmlrpclib.True)

What goes over the wire after this call is insanely complicated, far too much to include inline here. Looking through the wire format, and the higher-level source code, suggests out a number of problems with the MetaWeblog API:

But wait, there's more. RSS 2.0 is extensible through namespaces, so in theory the MetaWeblog API is extensible too. It says:

RSS 2.0 allows for the use of namespaces. If you wish to transmit an element that is part of a namespace, include a sub-struct in the struct passed to newPost and editPost whose name is the URL that specifies the namespace. The sub-element(s) of the struct are the value(s) from the namespace that you wish to transmit.

Thankfully, nobody actually does this. Movable Type extends the MetaWeblog API by simply defining a bunch of new elements in the struct called mt_allow_comments, mt_allow_pings, and so forth.

In other words, what we have here is an RPC-based API that starts with an XML-centric data model (RSS 2.0), shoves it into a struct, defines separate special cases for everything that isn't simply a name-value pair, ignores everything that isn't handled by the special cases, reinvents the concept of XML namespaces, and then serializes it all in a verbose XML format that looks like this. So we've reinvented XML, over RPC, over XML. Badly.

And passwords are still sent in the clear.

The Atom API

The Atom API was designed because the MetaWeblog API proved that RPC-based APIs were simply the wrong solution for this problem. The Blogger API was about as complicated as you could reasonably get before things went completely off the rails. "Shove everything into a struct" was an idea that sounded like it might solve some problems; but, as you can see, it caused more problems than it solved.

In direct response to the mess that is the MetaWeblog API, the Atom API was designed with several guiding principles in mind:

API Discovery

Previous weblog services had no concept of API discovery. They left it up to the end user to provide the exact API URL (http://example.com/mt/mt-xmlrpc.cgi). Some servers implemented undocumented functions like deletePost, and even knowing the type of software running on the other end was not enough because different versions of the same software supported extra functionality over time. Client software had to guess what functionality was provided and what extensions were supported

The Atom API assumes only that the end user knows her home page. It relies on a link tag in the head element of the home page that points to an Atom introspection file. The introspection file, in turn, lists the supported functions and extensions, as well as the URI associated with each function.

Here is an example of a home page with Atom API auto-discovery:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>My Weblog</title>
<link rel="service.edit" type="application/x.atom+xml" 
  href="/myblog/atom.cgi/introspection" title="Atom API" />
</head>
<body>
...
</body>
</html>

If this resource is http://www.example.com/, it says that the Atom introspection file is the http://www.example.com/myblog/atom.cgi/introspection resource. Note that this is actually routing through a CGI script, as are all the other examples I'll list here. Nothing in Atom requires this, but when I wrote a server prototype of the Atom API, I made a point to route everything through a single CGI script because there was some debate about whether this was even possible. It could easily be a set of CGI scripts, or JSPs, ASP, PHP, or any other language.

The introspection file then lists the supported function and extensions in a simple, well-defined XML format. There are a number of functions defined in the core Atom API, and vendors can extend the introspection file with XML namespaces to point to their own extension methods. A core Atom API introspection file (like http://www.example.com/myblog/atom.cgi/introspection) might look like this:

GET /myblog/atom.cgi/introspection HTTP/1.1
Host: example.com
HTTP/1.1 200 OK
Content-Type: application/x.atom+xml

<?xml version="1.0" encoding="utf-8"?>
<introspection xmlns="http://purl.org/atom/ns#" > 
  <search-entries>http://example.com/myblog/atom.cgi/search</search-entries>
  <create-entry>http://example.com/myblog/atom.cgi/edit</create-entry>
  <edit-template>http://example.com/atom.cgi/templates</edit-template>
  <user-prefs>http://example.com/myblog/atom.cgi/prefs</user-prefs>
  <categories>http://example.com/atom.cgi/categories</categories>
</introspection>

Retrieving Entries

If you're writing client software to manage a weblog, the first thing you'll probably want to do after getting the introspection file is get a list of existing entries. The introspection file lists the address for searching entries in <search-entries>. The client can add query string parameters such as atom-last to find recent entries. More complex examples are defined in the Atom API spec draft.

Here's how you would get a list of recent entries:

GET /myblog/atom.cgi/search?atom-last=20 HTTP/1.1
Host: example.com
HTTP/1.1 200 OK
Content-Type: application/x.atom+xml

<search-results xmlns="http://purl.org/atom/ns#" > 
  <entry>
    <title>My Second Post</title>
    <id>http://example.com/atom.cgi/edit/2</id>
  </entry>
  <entry>
    <title>My First Post</title>
    <id>http://example.com/atom.cgi/edit/1</id>
  </entry>
</search-results>

The remainder of the Atom API follows the principles of REST. New entries are created using HTTP POST to post an Atom entry to the create-entry address specified in the introspection file. Retrieving an entry is accomplished by doing an HTTP GET on the entry's URI, which is returned after creating new entry or in search results. Editing an entry is accomplished by doing an HTTP PUT on the entry's URI; deleting an entry is an HTTP DELETE on the entry's URI.

When I say "the entry's URI", remember that that's implementation-specific. These examples route everything through a single script (atom.cgi), just to prove that you can do that. Of course if you're implementing the Atom API on your own server, you don't have to do that; you could use JSP, or PHP, or Perl, or anything that can handle the four basic HTTP operations (GET, POST, PUT, DELETE). The introspection file rules all; it's the client's guide to the structure of the server's Atom web services.

As I said, retrieving an existing entry is as simple as an HTTP GET of the entry's URI. The search results told us that the first post had a URI of http://example.com/atom.cgi/edit/1, so let's get that:

GET /myblog/atom.cgi/edit/1 HTTP/1.1
Host: example.com
HTTP/1.1 200 OK
Content-Type: application/x.atom+xml

<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://purl.org/atom/ns#">
  <title>My First Entry</title> 
  <summary>My First Entry Excerpt (generally plaintext)</summary> 
  <author> 
    <name>Bob B. Bobbington</name> 
    <email>bob@example.com</email>
    <url>http://homepage.example.com/</url>
  </author> 
  <issued>2003-10-15T02:29:29</issued> 
  <created>2003-10-15T04:10:58Z</created> 
  <modified>2003-10-15T04:22:03Z</modified> 
  <link>http://example.com/myblog/archives/2003/11/19/My_First_Entry.html</link>
  <id>urn:example-com:myblog:1</id>
  <content type="application/xhtml+xml" xml:lang="en"> 
    <div xmlns="http://www.w3.org/1999/xhtml">
      <p>Hello, <em>weblog</em> world!</p>
      <p>This is my first post <strong>ever</strong>!</p>
    </div>
    </content>  
</entry>

Lots of information here: the entry has a title, an excerpt or summary, and an author who has a name, email address, and URL of his own. The entry has a created date and a modified date (usually server-generated), and an "issued" date (which is a date that the author would like to give to this entry, separate from when he actually posted it). The entry is viewable at a specific link, has an internal ID (a URN), and finally has some XHTML content.

The Atom content model is probably worth a whole article by itself, but for the moment let me just handwave and say that it can handle more than just XHTML. Any MIME type can be expressed (specify it in the @type attribute), and non-XML content (such as HTML or plain text) is simply escaped or put in a CDATA block, with a mode="escaped" attribute on the content element. It can even handle binary content (such as an image) by specifying @mode="base64" and including a base64-encoded representation of the data.

Creating, Editing, and Deleting entries

Posting a new entry is virtually symmetrical. To create a new entry, do an HTTP POST on the URI create-entries address specified in the introspection file. The body of the HTTP POST should be an entry, in the same Atom format as you got back from the server on retrieve:

POST /myblog/atom.cgi/edit HTTP/1.1
Host: example.com
Content-Type: application/x.atom+xml

<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://purl.org/atom/ns#">
  <title>My Entry Title</title> 
  <created>2003-11-17T12:29:29Z</created> 
  <content type="application/xhtml+xml" xml:lang="en"> 
    <div xmlns="http://www.w3.org/1999/xhtml">
      <p>Hello, <em>weblog</em> world!</p>
      <p>This is my third post <strong>ever</strong>!</p>
    </div>
  </content>  
</entry>

The server responds with an HTTP status code 201 "Created" and gives the entry's edit URI in the HTTP Location: header.

HTTP/1.1 201 Created
Location: http://example.com/myblog/atom.cgi/edit/3

Note that since we're using straight XML (rather than a serialization of XML over RPC over XML), extensibility is handled by XML namespaces. For example, Movable Type allows individual entries to allow comments or not. This functionality is not built into the Atom API, but Six Apart could easily extend the API like this:

POST /myblog/atom.cgi HTTP/1.1
Host: example.com
Content-Type: application/x.atom+xml

<?xml version="1.0" encoding="utf-8"?>
<entry
  xmlns="http://purl.org/atom/ns#"
  xmlns:mt="http://www.movabletype.org/atom/ns#">

  <title>My Entry Title</title> 
  <created>2003-11-17T12:29:29Z</created>
  <mt:allowComments>1</mt:allowComments>
  <content type="application/xhtml+xml" xml:lang="en"> 
    <div xmlns="http://www.w3.org/1999/xhtml">
      <p>Hello, <em>weblog</em> world!</p>
      <p>This is my first post <strong>ever</strong>!</p>
    </div>
  </content>  
</entry>

Modifying an existing entry is almost the same as creating one. You do an HTTP PUT on the entry's URI (as returned in the Location: header after creating it or in the id element in the search results), with the entry in the body of the HTTP message, in the same Atom XML format we've seen in other method calls:

PUT /myblog/atom.cgi/edit/1 HTTP/1.1
Host: example.com
Content-Type: application/x.atom+xml

<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://purl.org/atom/ns#">
  <title>My First Entry</title> 
  <summary>My First Entry Excerpt (generally plaintext)</summary> 
  <author> 
    <name>Bob B. Bobbington</name> 
    <email>bob@example.com</email>
    <url>http://homepage.example.com/</url>
  </author> 
  <issued>2003-10-15T02:29:29</issued> 
  <created>2003-10-15T04:10:58Z</created> 
  <modified>2003-10-15T04:22:03Z</modified> 
  <link>http://example.com/myblog/archives/2003/11/19/My_First_Entry.html</link>
  <id>urn:example-com:myblog:1</id>
  <content type="application/xhtml+xml" xml:lang="en"> 
    <div xmlns="http://www.w3.org/1999/xhtml">
      <p>Hello, <em>weblog</em> world!</p>
      <p>This is my first post <strong>ever</strong>!</p>
    </div>
    </content>  
</entry>

On success the server responds with an HTTP status code 205 "Reset Content".

HTTP/1.1 205 Reset Content

Deleting an entry is even simpler:

DELETE /myblog/atom.cgi/edit/3 HTTP/1.1
Host: example.com
HTTP/1.1 200 OK

Further reading

More Dive Into XML Columns

Identifying Atom

XML on the Web Has Failed

The Atom Link Model

Normalizing Syndicated Feed Content

Atom Authentication

We have, in some sense, come full circle. The original LiveJournal API was REST-based, although it was limited to the simple name-value pairs for input and output. After that, weblogging APIs went down a path of RPC-style services, until that became completely unmanageable. And now we're back to a document-centric, REST-inspired service again.

The Atom API has several other methods beyond add, edit, delete, retrieve, search. It can be used for posting comments on entries, managing users and user preferences, managing categories, managing site templates; eventually it will be usable for everything you can do manually with your weblog through your server's browser-based interface. You can read the latest draft for yourself or download sample source code that implements the API in Python, Perl, PHP, or Java.

"But, but, but," I hear you cry, "what about passwords sent in the clear?" Ah, yes. Atom authentication deserves its own article, and I promise to tackle it, if not next month then the month after. I can promise that it does not involve sending plain text passwords in the clear.

XML.com Copyright © 1998-2006 O'Reilly Media, Inc.