I wish I didn't need to write this article. My life would be much simpler if Atom could just use existing HTTP authentication, as-is. But it can't; I'm going to tell you why and then I'm going to tell you what we're doing instead.
Let's back up. Atom, in case you missed it, is a new standard that uses XML over HTTP to publish and syndicate web-based content. It is initially targeted at weblogs, and most of the early adopters so far have been weblog vendors and users. It consists of the Atom API, which I discussed last month, and the Atom syndication format, which I will discuss next month. This month I want to talk about authentication.
As with all design decisions, it helps to list the problems you are trying to solve, and the audience you are trying to solve them for, in the form of a character sketch.
Let's talk about Bob. Bob has a weblog. Bob hosts his weblog on a low-end web hosting service, one which hosts several hundred weblogs on a single machine (a single IP
address). Bob can FTP files to his web directory, and he can run CGI scripts in Perl and Python. But Bob has no remote shell access on his server (that costs extra), no
.htaccess rights to change his Apache configuration (his hosting provider set
AllowOverride None), no PHP scripts, and no database. He can serve
static HTML files and run CGI scripts, and that's it.
This is not a contrived scenario. Many web hosting providers offer exactly this environment, and two popular weblog publishing systems, Movable Type and Blosxom, run as CGI scripts in this environment. There are thousands of real people like Bob.
Bob also travels a lot, going to various O'Reilly conferences. He would like to be able to post to his Atom-enabled weblog from an Atom-enabled client, without anyone else at the conference (who might be listening in on the wireless network) being able to steal Bob's password or take over his weblog.
As we saw last month, all previous weblog publishing APIs send passwords over the wire in clear text. Clearly none of these APIs will work for Bob. A number of solutions were proposed during the development of Atom, but none of them help Bob.
.htaccessrights to configure his passwords; and, because of the way Apache works, CGIs can't implement digest authentication on their own. (Scripts handled by an Apache module, such as mod_php or mod_perl, can implement HTTP digest authentication. But external CGI processes can't because Apache does not pass the necessary headers along to the CGI script. But that still doesn't help Bob because his hosting provider doesn't offer PHP; and, even if they did, his weblog software doesn't run on PHP anyway.)
It looks like Bob is screwed.
A little-known fact about RFC 2617 is that HTTP authentication is extensible. The RFC defines and Apache has modules for Basic and Digest authentication, but developers are free to define different algorithms for use within the HTTP authentication framework, and servers are free to insist that clients support those algorithms if they want access to the server's resources.
After much haggling, the algorithm we chose was WSSE Username Token (PDF). WSSE is a family of open security specifications for web services, specifically SOAP web services. However, the Username Token algorithm is not SOAP-specific; it can be easily adapted to work within the HTTP authentication framework, and it solves all of Bob's problems.
The algorithm itself works like this:
Create a password digest:
PasswordDigest = Base64 \
(SHA1 (Nonce + CreationTimestamp + Password))
An example will help make this clearer.
"bob", and his password is
"2003-12-15T14:43:07Z", so that's the creation timestamp.
Base64(SHA1 ("d36e316282959a9ed4c89851497a717f" + "2003-12-15T14:43:07Z" + "taadtaadpstcsm")), which is
"quR/EWLAV4xLf9Zqyw4pDmfV90Y=". Most languages have built-in libraries to create SHA-1 hashes and to encode strings in Base64 format.
Now let's see how this algorithm fits into the HTTP authentication framework.
Bob's weblog is at
http://bob.example.com/, and his Atom endpoint is at
http://bob.example.com/atom.cgi. Bob's Atom-enabled client tries to post to his weblog, by sending an HTTP POST request with his Atom entry:
POST /atom.cgi HTTP/1.1 Host: bob.example.com Content-Type: application/atom+xml <?xml version="1.0" encoding="utf-8"?> <entry xmlns="http://purl.org/atom/ns#"> <title>My Entry Title</title> <created>2003-12-15T14:43:07Z</created> <content type="application/xhtml+xml" xml:lang="en"> <div xmlns="http://www.w3.org/1999/xhtml"> <p>Hello, <em>weblog</em> world!</p> <p>This is my third post <strong>ever</strong>!</p> </div> </content> </entry>
But this request didn't include any authentication information. The server responds with an
HTTP/1.1 401 Unauthorized WWW-Authenticate: WSSE realm="foo", profile="UsernameToken"
The profile is constant and should always be
The realm is determined by the server and can be anything.
Bob repeats his request, this time with his authentication credentials: username, password digest, nonce, and creation date.
POST /atom.cgi HTTP/1.1
Authorization: WSSE profile="UsernameToken"
X-WSSE: UsernameToken Username="bob", PasswordDigest="quR/EWLAV4xLf9Zqyw4pDmfV9OY=", Nonce="d36e316282959a9ed4c89851497a717f", Created="2003-12-15T14:43:07Z"
<?xml version="1.0" encoding="utf-8"?>
<title>My Entry Title</title>
<content type="application/xhtml+xml" xml:lang="en">
<p>Hello, <em>weblog</em> world!</p>
<p>This is my third post <strong>ever</strong>!</p>
Bob's Atom-enabled weblogging software looks in the
X-WSSE: header for the actual
authentication credentials, and recreates the steps Bob took in order to verify that Bob knows his password.
If Bob got his password wrong, the server simply responds with an
HTTP 401 Unauthorized with the
WWW-Authenticate: header, same as before; or, optionally, with some explanatory text in the body
of the message to tell the client what's going on. If Bob got his password right, the server accepts the
request, posts the new entry, responds with an
HTTP 201 Created, and gives the location of
the newly created entry.
I want to briefly mention three design decisions and the problems they solve. Remember when I said
that CGI scripts couldn't implement HTTP authentication because Apache didn't pass along the appropriate
headers? The header I was talking about is the
Authorization: header. In this case, Apache will
Authorization: header and notice that the authentication algorithm is "WSSE". Apache
doesn't have a module to handle this, so it will strip the
Authorization: header and pass the rest
of the headers (including
X-WSSE:) on to the CGI script.
Second, if Bob has previously successfully authenticated with this same nonce, the server may recognize that and reject the request (with a 401). The server may keep track of nonces for a limited amount of time (usually a matter of minutes) and reject duplicates. If a client tries to reuse a nonce after that time, the server can simply reject it based on the creation timestamp (since the password hash is made from both the nonce and the timestamp). In turn, Bob's Atom-enabled client software generates a new nonce and creation timestamp with each request. This will protect against replay attacks.
Third, this leaves the door open for future versions of Atom supporting other profiles of WSSE, such as Kerberos. But let's jump off that bridge when we come to it.
OK, so that's how it looks when it works, and that's how it looks when it doesn't work. But as you can see,
this is an inefficient process. Bob sent his entire entry to the server, then the server rejected it, then Bob
sent his entire entry to the server again -- this time with his WSSE-formatted credentials. But
there's nothing about HTTP authentication or the WSSE Username Token algorithm that requires this extra round
trip. If Bob's Atom-enabled client software knows ahead of time that Bob's Atom-enabled server is going to ask
for WSSE Username Token authentication, it can simply calculate the credentials and send them with the initial
request. In the best case, the server sees the credentials, verifies them, processes the request, and returns
the appropriate success code -- all without ever generating a 401. In the worst case, the server doesn't
actually support WSSE Username Token, so it simply responds with a 401 and a
header that details what algorithms it does support.
More Dive Into XML Columns
Let's make sure we've found a solution that works for Bob. Extending HTTP authentication with WSSE Username Token:
.htaccessrights to set up passwords. (The CGI application can manage passwords itself.)
X-WSSE:header to get the credentials, and they can generate a 401 error and a
WWW-Authenticate:header if authentication fails.
So that's what Atom authentication looks like and that's why. Because it's the simplest thing that works for Bob.
Disclaimer: the Atom API has not been finalized. While this authentication scheme has been deployed by several vendors, it may still change slightly before the Atom API goes 1.0. Feel free to implement it now, but be prepared to reimplement it later.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.