The Atom API
Atom is an up-and-coming format for editing, syndicating, and archiving weblogs and other episodic web sites. The final details are still being hashed out, but that's never stopped me before, having written several articles about XHTML 2. To understand the problems that Atom is designed to solve, we should look briefly at what came before it.
In the beginning, there was LiveJournal. LiveJournal had an API, the LiveJournal Client/Server API. It worked over HTTP, and it looked like this:
POST /interface/flat HTTP/1.1
Host: www.livejournal.com
Content-type: application/x-www-form-urlencoded
mode=login&user=test&password=test
HTTP/1.1 200 OK
Content-Type: text/plain
name
Mr. Test Account
success
OK
message
Hello Test Account!
A function call is an HTTP POST to a specific URL, which is the same for all function calls. The function name is given as a parameter in a list of form-encoded key-value pairs. The result is also a list of key-value pairs which are separated by carriage returns instead of ampersands.
Things to notice right off the bat:
Next in our abbreviated history is the Blogger API. The
Blogger API was created by Evan Williams of Pyra Labs and was quickly
adopted by virtually everyone. It defined a series of functions, such as
blogger.newPost, which took as arguments
application_key (application-specific, each developer signed up
to receive one), blog_id (defined within the Blogger system),
username, password, entry_text, and a
boolean flag publish which controlled whether to publish the
new post immediately or leave it in draft mode.
The Blogger API was based on XML-RPC, so a call to
newPost(APP_KEY, BLOG_ID, USERNAME, PASSWORD, ENTRY_TEXT,
PUBLISH) would send this over the wire:
POST /api/RPC2 HTTP/1.1
Host: plant.blogger.com
Content-Type: text/xml
<?xml version='1.0'?>
<methodCall>
<methodName>blogger.newPost</methodName>
<params>
<param>
<value>
<string>APP_KEY</string>
</value>
</param>
<param>
<value>
<string>BLOG_ID</string>
</value>
</param>
<param>
<value>
<string>USERNAME</string>
</value>
</param>
<param>
<value>
<string>PASSWORD</string>
</value>
</param>
<param>
<value>
<string>ENTRY_TEXT</string>
</value>
</param>
<param>
<value>
<boolean>PUBLISH</boolean>
</value>
</param>
<params>
</methodCall>
Things to note here:
<string> is optional
and is omitted by some XML-RPC servers? And so forth.In direct response to the perceived limitations of the Blogger API, especially the lack of extensibility, since many people wanted titles, UserLand created the MetaWeblog API. It solved some of the problems but at the cost of added complexity. It was also based on XML-RPC, but it replaced the single entry_text string argument with a struct which could hold multiple pieces of information.
As the MetaWeblog API spec puts it,
The MetaWeblog API uses an XML-RPC struct to represent a weblog post. Rather than invent a new vocabulary for the metadata of a weblog post, we use the vocabulary for an item in RSS 2.0. So you can refer to a post's title, link and description; or its author, comments, enclosure, guid, etc. using the already-familiar names given to those elements in RSS 2.0.
In other words, if you want to post a new entry that would be represented like this in RSS,
<item>
<title>My Weblog Entry</title>
<description>This is my first post to my weblog.</description>
<pubDate>Mon, 13 Oct 2003 13:29:54 GMT</pubDate>
<author>Mark Pilgrim (f8dy@example.com)</author>
<category>Unfiled</category>
you would create that entry with a struct, something like this:
>>> import xmlrpclib
>>> server = xmlrpclib.ServerProxy('http://www.example.com/RPC2')
>>> server.metaWeblog.newPost(BLOG_ID, USERNAME, PASSWORD,
{'title': 'My Weblog Entry',
'description': 'This is my first post to my weblog.',
'dateCreated': '2003-10-13T13:29:54',
'author': 'Mark Pilgrim (f8dy@example.com)',
'category': 'Unfiled'},
xmlrpclib.True)
What goes over the wire after this call is insanely complicated, far too much to include inline here. Looking through the wire format, and the higher-level source code, suggests out a number of problems with the MetaWeblog API:
<pubDate>, but in the API the creation date goes in
<dateCreated>.source, enclosure, and
category) can have attributes. These are also special-cased.
For enclosure, the MetaWeblog API tells us to "pass a struct
with sub-elements whose names match the names of the attributes according
to the RSS 2.0 spec, url, length and type." For source,
"pass a struct with sub-elements, url and name."categories element within
the struct which is an array of strings. Other cases -- a post with
multiple authors, for example -- are simply impossible.domain attribute that
specifies the domain in which the category name resides. To serialize
this, the MetaWeblog API tells us: "If an element has both attributes and
a value, make the element a struct, include the attributes as
sub-elements, and create a sub-element for the value with the name
_value. Note that this means that no element can be passed through the API
that has an attribute whose name is _value."categories element does not handle serializing attributes for
each category; it is always simply a list of strings.But wait, there's more. RSS 2.0 is extensible through namespaces, so in theory the MetaWeblog API is extensible too. It says:
RSS 2.0 allows for the use of namespaces. If you wish to transmit an element that is part of a namespace, include a sub-struct in the struct passed to newPost and editPost whose name is the URL that specifies the namespace. The sub-element(s) of the struct are the value(s) from the namespace that you wish to transmit.
Thankfully, nobody actually does this. Movable Type extends the
MetaWeblog API by simply defining a bunch of new elements in the struct
called mt_allow_comments, mt_allow_pings, and so
forth.
In other words, what we have here is an RPC-based API that starts with an XML-centric data model (RSS 2.0), shoves it into a struct, defines separate special cases for everything that isn't simply a name-value pair, ignores everything that isn't handled by the special cases, reinvents the concept of XML namespaces, and then serializes it all in a verbose XML format that looks like this. So we've reinvented XML, over RPC, over XML. Badly.
And passwords are still sent in the clear.
The Atom API was designed because the MetaWeblog API proved that RPC-based APIs were simply the wrong solution for this problem. The Blogger API was about as complicated as you could reasonably get before things went completely off the rails. "Shove everything into a struct" was an idea that sounded like it might solve some problems; but, as you can see, it caused more problems than it solved.
In direct response to the mess that is the MetaWeblog API, the Atom API was designed with several guiding principles in mind:
|
Previous weblog services had no concept of API discovery. They left it
up to the end user to provide the exact API URL
(http://example.com/mt/mt-xmlrpc.cgi). Some servers
implemented undocumented functions like deletePost, and even
knowing the type of software running on the other end was not enough
because different versions of the same software supported extra
functionality over time. Client software had to guess what functionality
was provided and what extensions were supported
The Atom API assumes only that the end user knows her home page. It
relies on a link tag in the head element of the
home page that points to an Atom introspection file. The introspection
file, in turn, lists the supported functions and extensions, as well as
the URI associated with each function.
Here is an example of a home page with Atom API auto-discovery:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>My Weblog</title>
<link rel="service.edit" type="application/x.atom+xml"
href="/myblog/atom.cgi/introspection" title="Atom API" />
</head>
<body>
...
</body>
</html>
If this resource is http://www.example.com/, it says that
the Atom introspection file is the
http://www.example.com/myblog/atom.cgi/introspection
resource. Note that this is actually routing through a CGI script, as are
all the other examples I'll list here. Nothing in Atom requires this, but
when I wrote a server prototype of the Atom API, I made a point to route
everything through a single CGI script because there was some debate about
whether this was even possible. It could easily be a set of CGI scripts,
or JSPs, ASP, PHP, or any other language.
The introspection file then lists the supported function and extensions
in a simple, well-defined XML format. There are a number of functions
defined in the core Atom API, and vendors can extend the introspection
file with XML namespaces to point to their own extension methods. A core
Atom API introspection file (like
http://www.example.com/myblog/atom.cgi/introspection) might
look like this:
GET /myblog/atom.cgi/introspection HTTP/1.1
Host: example.com
HTTP/1.1 200 OK
Content-Type: application/x.atom+xml
<?xml version="1.0" encoding="utf-8"?>
<introspection xmlns="http://purl.org/atom/ns#" >
<search-entries>http://example.com/myblog/atom.cgi/search</search-entries>
<create-entry>http://example.com/myblog/atom.cgi/edit</create-entry>
<edit-template>http://example.com/atom.cgi/templates</edit-template>
<user-prefs>http://example.com/myblog/atom.cgi/prefs</user-prefs>
<categories>http://example.com/atom.cgi/categories</categories>
</introspection>
If you're writing client software to manage a weblog, the first thing
you'll probably want to do after getting the introspection file is get a
list of existing entries. The introspection file lists the address for
searching entries in <search-entries>. The client can
add query string parameters such as atom-last to find recent
entries. More complex examples are defined in the Atom API spec
draft.
Here's how you would get a list of recent entries:
GET /myblog/atom.cgi/search?atom-last=20 HTTP/1.1
Host: example.com
HTTP/1.1 200 OK
Content-Type: application/x.atom+xml
<search-results xmlns="http://purl.org/atom/ns#" >
<entry>
<title>My Second Post</title>
<id>http://example.com/atom.cgi/edit/2</id>
</entry>
<entry>
<title>My First Post</title>
<id>http://example.com/atom.cgi/edit/1</id>
</entry>
</search-results>
The remainder of the Atom API follows the principles of REST. New
entries are created using HTTP POST to post an Atom entry to the
create-entry address specified in the introspection file.
Retrieving an entry is accomplished by doing an HTTP GET on the entry's
URI, which is returned after creating new entry or in search
results. Editing an entry is accomplished by doing an HTTP PUT on the
entry's URI; deleting an entry is an HTTP DELETE on the entry's URI.
When I say "the entry's URI", remember that that's
implementation-specific. These examples route everything through a single
script (atom.cgi), just to prove that you can do that. Of
course if you're implementing the Atom API on your own server, you don't
have to do that; you could use JSP, or PHP, or Perl, or anything that can
handle the four basic HTTP operations (GET, POST, PUT, DELETE). The
introspection file rules all; it's the client's guide to the structure of
the server's Atom web services.
As I said, retrieving an existing entry is as simple as an HTTP GET of
the entry's URI. The search results told us that the first post had a URI
of http://example.com/atom.cgi/edit/1, so let's get that:
GET /myblog/atom.cgi/edit/1 HTTP/1.1
Host: example.com
HTTP/1.1 200 OK
Content-Type: application/x.atom+xml
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://purl.org/atom/ns#">
<title>My First Entry</title>
<summary>My First Entry Excerpt (generally plaintext)</summary>
<author>
<name>Bob B. Bobbington</name>
<email>bob@example.com</email>
<url>http://homepage.example.com/</url>
</author>
<issued>2003-10-15T02:29:29</issued>
<created>2003-10-15T04:10:58Z</created>
<modified>2003-10-15T04:22:03Z</modified>
<link>http://example.com/myblog/archives/2003/11/19/My_First_Entry.html</link>
<id>urn:example-com:myblog:1</id>
<content type="application/xhtml+xml" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>Hello, <em>weblog</em> world!</p>
<p>This is my first post <strong>ever</strong>!</p>
</div>
</content>
</entry>
Lots of information here: the entry has a title, an excerpt or summary, and an author who has a name, email address, and URL of his own. The entry has a created date and a modified date (usually server-generated), and an "issued" date (which is a date that the author would like to give to this entry, separate from when he actually posted it). The entry is viewable at a specific link, has an internal ID (a URN), and finally has some XHTML content.
The Atom content model is probably worth a whole article by itself, but
for the moment let me just handwave and say that it can handle more than
just XHTML. Any MIME type can be expressed (specify it in the
@type attribute), and non-XML content (such as HTML or plain
text) is simply escaped or put in a CDATA block, with a
mode="escaped" attribute on the content element.
It can even handle binary content (such as an image) by specifying
@mode="base64" and including a base64-encoded representation
of the data.
Posting a new entry is virtually symmetrical. To create a new entry,
do an HTTP POST on the URI create-entries address specified
in the introspection file. The body of the HTTP POST should be an entry,
in the same Atom format as you got back from the server on retrieve:
POST /myblog/atom.cgi/edit HTTP/1.1
Host: example.com
Content-Type: application/x.atom+xml
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://purl.org/atom/ns#">
<title>My Entry Title</title>
<created>2003-11-17T12:29:29Z</created>
<content type="application/xhtml+xml" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>Hello, <em>weblog</em> world!</p>
<p>This is my third post <strong>ever</strong>!</p>
</div>
</content>
</entry>
The server responds with an HTTP status code 201 "Created" and gives
the entry's edit URI in the HTTP Location: header.
HTTP/1.1 201 Created
Location: http://example.com/myblog/atom.cgi/edit/3
Note that since we're using straight XML (rather than a serialization of XML over RPC over XML), extensibility is handled by XML namespaces. For example, Movable Type allows individual entries to allow comments or not. This functionality is not built into the Atom API, but Six Apart could easily extend the API like this:
POST /myblog/atom.cgi HTTP/1.1
Host: example.com
Content-Type: application/x.atom+xml
<?xml version="1.0" encoding="utf-8"?>
<entry
xmlns="http://purl.org/atom/ns#"
xmlns:mt="http://www.movabletype.org/atom/ns#">
<title>My Entry Title</title>
<created>2003-11-17T12:29:29Z</created>
<mt:allowComments>1</mt:allowComments>
<content type="application/xhtml+xml" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>Hello, <em>weblog</em> world!</p>
<p>This is my first post <strong>ever</strong>!</p>
</div>
</content>
</entry>
Modifying an existing entry is almost the same as creating one. You do
an HTTP PUT on the entry's URI (as returned in the Location:
header after creating it or in the id element in the search
results), with the entry in the body of the HTTP message, in the same Atom
XML format we've seen in other method calls:
PUT /myblog/atom.cgi/edit/1 HTTP/1.1
Host: example.com
Content-Type: application/x.atom+xml
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://purl.org/atom/ns#">
<title>My First Entry</title>
<summary>My First Entry Excerpt (generally plaintext)</summary>
<author>
<name>Bob B. Bobbington</name>
<email>bob@example.com</email>
<url>http://homepage.example.com/</url>
</author>
<issued>2003-10-15T02:29:29</issued>
<created>2003-10-15T04:10:58Z</created>
<modified>2003-10-15T04:22:03Z</modified>
<link>http://example.com/myblog/archives/2003/11/19/My_First_Entry.html</link>
<id>urn:example-com:myblog:1</id>
<content type="application/xhtml+xml" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>Hello, <em>weblog</em> world!</p>
<p>This is my first post <strong>ever</strong>!</p>
</div>
</content>
</entry>
On success the server responds with an HTTP status code 205 "Reset Content".
HTTP/1.1 205 Reset Content
Deleting an entry is even simpler:
DELETE /myblog/atom.cgi/edit/3 HTTP/1.1
Host: example.com
HTTP/1.1 200 OK
|
More Dive Into XML Columns | |
We have, in some sense, come full circle. The original LiveJournal API was REST-based, although it was limited to the simple name-value pairs for input and output. After that, weblogging APIs went down a path of RPC-style services, until that became completely unmanageable. And now we're back to a document-centric, REST-inspired service again.
The Atom API has several other methods beyond add, edit, delete, retrieve, search. It can be used for posting comments on entries, managing users and user preferences, managing categories, managing site templates; eventually it will be usable for everything you can do manually with your weblog through your server's browser-based interface. You can read the latest draft for yourself or download sample source code that implements the API in Python, Perl, PHP, or Java.
"But, but, but," I hear you cry, "what about passwords sent in the clear?" Ah, yes. Atom authentication deserves its own article, and I promise to tackle it, if not next month then the month after. I can promise that it does not involve sending plain text passwords in the clear.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.