
A Weblog API For the Grassroots
by Rich SalzAugust 05, 2003
Last month I looked at the Necho message format. I compared it to RSS, its predecessor. In this column, I want to look at its API. Joe Gregorio is the main author of the API, written in the IETF RFC format. Joe is using Marshall Rose's xml2rfc package, so various formats are available. Make sure to pick up the latest version; as of the time of this writing, draft 6 was the most current one. That API drafts use the name "Atom", which was the old favorite, but it had trademark conflicts.
As you read the first of this column, I'll be talking about the Atom API, which is used to manipulate what I previously called Necho data. But both of those might end up being called Feedster pretty soon, judging by an entry in the Wiki, whose URL still reflect it's original name, pie. Whew! At least we know what it isn't: it's not RSS 2.0, which is now owned by the Harvard Law School.
It's Not WebDAV, Either
What does the API do? According to the draft, "AtomAPI is an application level protocol for publishing and editing web resources..." Compare this to RFC 2518, HTTP Extensions for Distributed Authoring -- WEBDAV. According to the WebDAV FAQ:
The stated goal of the WebDAV working group is (from the charter) to "define the HTTP extensions necessary to enable distributed web authoring tools to be broadly interoperable, while supporting user needs", and in this respect DAV is completing the original vision of the Web as a writable, collaborative medium.
On the surface there seems to be a lot of overlap, but on closer inspection this isn't quite true. WebDAV spends a lot of time on locking, which is required for distributed authoring of the same document, but probably less germane to the single author/publisher model of a weblog. It's model of a collection nicely maps into a weblog entry and its comments, but it enforces a hierarchical syntax on the URLs which may not be always be possible or even desirable in weblog software. Finally, it defines a suite of HTTP extensions -- new verbs, new headers -- which also make the burden of implementation too great. Many weblogs are maintained using CGI scripts on an ISP; requiring such users to require a new web server (or at least a new Apache module) would immediately disenfranchise them from the new Feedster community.
This last point -- the high barrier to involvement required by WebDAV -- runs so completely counter to Feedster (and its RSS history) that it alone is reason enough to discard it as an API. Still, WebDAV has some interesting ideas, and I hope that there are no gratuitous incompatibilities introduced, so that a future merge -- WebDAVLog? -- could be possible.
Scope
So far, the most detailed part of the API draft has to do with the
manipulation of entries. Doing an HTTP POST to the
appropriate URL creates a new entry, and the server returns an HTTP
Location header with the URL of the new entry. It's also
responsible for "filling out" the entry, adding its own values for the
link, id, and timestamp elements. (For a
description of these elements, see last
month's column.)
Doing an HTTP GET on the URL obviously retrieves the
entry, a PUT replaces the entry with new contents, and
DELETE removes it. There's also a search operation, that
I'll discuss below. Of course everything (probably other than
GET) needs authentication. Given the wide variety of
security mechanisms that are available, it's quite appropriate for the API
document to defer this to other standards.
More from Rich Salz |
Upon reading the API my first reaction is that it I don't understand the use model. (The term "use model" is popular among system designers and standards committees, if only because it sounds so much better than "what are people supposed to do with this?") Let's assume that the first adopters of the API are weblog developers and users. They currently work by using private systems to post articles to their weblog. Some systems automatically generate a syndication feed from the articles, some require the user to do it manually, and some require assistance, such as forcing the author to enter a summary of their article. Either way, the end result is that one or more syndication files (RSS, Necho, Atom, Feedster, etc) are generated and maintained by the software.
On the other hand, in the AtomAPI draft, the content and the metadata are tied together -- in the web services sphere we'd use the term tightly coupled -- which doesn't match the current weblog use models that I know about. If syndication data is really metadata about web content, shouldn't the two pieces be identified and manipulated separately? For example, how can I use this API to create a Necho entry for an article that already exists, such as this one? What should the server return as the value of the Location header -- the content URL, a metadata URL, or nothing?
The API also includes a search operation. This is specified as a query
string with a GET to a specific URL, and the client provides
an HTTP Accept header that specifies the Necho content-type.
(That's a nice way for the client to be defensive, since if the wrong URL
is specified, the server shouldn't send random HTML or other data.)
Unfortunately, the URL query-string syntax severely limits the power of this operation. Since the entries can be considered a single XML document (although probably a synthetic one for a large weblog), XPath becomes the obvious query language:
/entry[position() > last() - 20]
string-contains(/entry/author/name, 'Gates')
This also emphasizes that the Necho's timestamp elements should have a numeric attribute that counts seconds since some epoch:
/entry/issued[@epoch > 1060017388]
If the epoch is classic Unix time, than this expression finds all entries posted since I wrote this column.
Tell'em Where to Go
The draft also provides, although it's not as complete, specifications for editing user preferences, content templates, adding comments, and so on. All of these are done by having the client software reference specific URLs. Of course, those URLs will vary among implementations and even within multiple users of the same implementation hosted within a single service. For example, the following could all be valid URLs to create an entry:
http://service.example.com/cgi-bin/post.py?u=rsalz&a=new
http://rsalz.example.com/blog/post
http://rsalz.example.com/~rsalz/blog/post/new
And, of course, the problem multiplies when you add the template and other operations, to say nothing of those operations that operate on individual entries (such as posting a comment).
In order to address this, the API defines an "introspection" file that maps API operations into a URL:
<introspection xmlns="...">
<create-entry>http://rsalz.example.com/blog/post</create-entry>
...
To find the API introspection file, the draft recommends parsing an RSD
file. RSD stands for Really Simple Discovery and was defined by Daniel
Berlinger last fall. It defines an XML document that maps API names to
a URL; in this case, it would be the URL of the introspection file. How
does a client find the RSD file? The weblog author must put an HTML
link element in their home page, like this:
<link rel="APIlist" type="application/rsd+xml"
title="RSD"
href="http://rsalz.example.com/blog/rsd.xml" />
The rsd.xml file has some metadata about the weblog
software and the following entry to point to the introspection file, which
should be pretty clear:
<api name="AtomAPI
apiLink="http://rsalz.example.com/glob/introspection.xml"
preferred="true" blogID=""/>
(The blogID attribute is used to identify multiple weblogs
within a server.)
I think the API draft is wrong here and should avoid a few extra
fetches and indirections. The draft should define its own
link entry on the home page, with a default of
"introspection.xml" as a URL relative to the home page URL.
|
Related Reading Programming Web Services with SOAP |
A Better Approach
All of these indirections point out where REST starts to fail a bit,
and the word epicycles
comes to mind: independent implementations of similar services. In other
words, once you have varying URIs for conceptually similar resources, you
have to keep adding levels of indirection until you can end up with a
single document that ultimately refers you to where you really want to go.
In circumstances like these, the WSDL approach -- fewer URLs, but the
message says what to do -- seems more sensible. In addition, the desire
to use GET and the query-string syntax limits it enforces
cripple the search operation.
We can avoid all this by leverage SOAP to post data or perform
operations other than GET. We'll reserve the SOAP Body for
the application content and instead define a new header that describes
what's being done and where to do it. For example,
<n:necho soap:mustUnderstand="1" xmlns:n="...">
<n:operation
n:resource='http://www.example.com/entries/23'
n:action='http://atom.example.com/actions/postComment'/>
</n:necho>
It should be fairly clear what's going on. The header, which must be
understood by the receiving application, contains a list of operations to
perform; in this case, only one. The operation, specified in the
action attribute, is a URI, which allows flexibility. The
resource attribute specifies the URL on the server where the
operation should be performed. The content of the comment is taken from
the SOAP Body. If multiple operations have content, they can be put in as
children of the appropriate operation elements.
Security can be defined by using WS-Security framework defined by the OASIS Web Services Security working group. Leveraging other WS specifications is also possible.
Finally, note that in most cases this won't be needed. A simple SOAP
message POST'd to a single well-known URL, with the target
specified in the header avoids needless indirection and cluttering
application content with target specification. Seems like a winner to
me.
Do you agree with Rich Salz's criticisms of the Necho API? Share your opinion in our forum.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- Why is a numeric timestamp needed?
2003-08-14 06:57:15 Asbjørn Ulsberg [Reply]
I read: "his also emphasizes that the Necho's timestamp elements should have a numeric attribute that counts seconds since some epoch".
Why is this? I can't see any reason to have a Unix timestamp in the feed. As W3CDTF is a date-format that sorts perfectly as a string, the ">" would work as well with it as it will with a numerical timestamp.
Do you have any other reasons for requiring a timestamp in the feed?
- Why is a numeric timestamp needed?
2003-08-14 18:44:32 Rich Salz [Reply]
Relative operations are a lot easier with numerics (e.g., is this item less than 24 hours old) thant with xsd:dateTime.
- Why is a numeric timestamp needed?
- So much confusion
2003-08-05 23:24:25 Mark Pilgrim [Reply]
Wow, so much confusion in this article, it's hard to know where to start. Let's see...
1. The whole thing about WebDAV is confusing. Why would we want to try to be compatible with a protocol nobody uses?
2. The API will not be deferring authentication mechanisms to arbitrary other standards. The supported authentication mechanism(s) are currently under discussion. WS-Security is one of many proposals.
3. The whole confusion with the API being "tightly coupled" with the syndication format. WHO CARES if the site has a syndicated feed or not? The API defines a way for authors to add/edit/delete/search content on their site. Your questions about "well what if I don't want a syndicated feed" just don't make any sense at all. The conceptual data model is similar, and so the serialization is similar. That's all.
4. "Since the entries can be considered a single XML document" is just laughable on its face, and is another sign of category error. Just because we use XML in the API doesn't mean that we're mandating it for a storage format. XPath makes no sense at all for this application. All the personal CMSes I'm aware of either store content in a series of flat files, or in a low-end database. Neither supports XPath. Mandating an XPath-based query interface would put this waaay out of the realm of implementability for 99% of the target audience.
5. As Joe pointed out, RSD is being dropped in the next draft.
6. I think you are also confused in thinking that a REST approach mandates *separate* URLs for each action. This is a common misconception. The URLs for new post, edit post, delete post, and get post can all be the same URL of a single script (such as a CGI), with no different parameters or anything. The CGI script dispatches based on the HTTP verb used. See http://bitworking.org/news/Carrot_Versus_Orange for an example.
7. Your entire argument in favor of SOAP seems to be based on the level of indirection introduced by RSD (soon to be nixed) and several misconceptions, such as that you need separate URLs for each action (not true).
In short, this entire article is poorly argued. Where it is not poorly argued, it is dangerously misleading, and where it is not misleading, it is simply wrong.
- Nobody uses WebDAV?
2004-02-01 01:36:29 Bayle Shanks [Reply]
I'm not an expert on web standards, but my impression is that WebDAV is an up-and-coming technology, not a dud. I've been puzzling over the relationship between WebDAV and Atom myself. I'll continue this discussion on the Atom wiki.
- Many people use WebDAV
2003-08-19 16:01:26 Jim Whitehead [Reply]
There are currently many applications and servers that support the WebDAV protocol. These include:
Microsoft Office (Word, Excel, PowerPoint)
Microsoft Web Folders
Adobe Photoshop, Illustrator, FrameMaker, InDesign, GoLive, Acrobat
Macromedia Dreamweaver
XML Spy
Apple iDisk, iCal
TeamDrive
Servers:
Apache mod_dav (native and Catacomb)
Microsoft IIS
Oracle 9i
Xythos Web File Server
WebStar V
Filenet Panagon
Tamino
SAP
There are many people and organizations that use WebDAV daily, and depend on it to ensure they can share and collaborate on their files.
SecuritySpace records that there are over 450,000 Apache servers on the Internet that report installation of mod_dav -- doubtless there are many more behind firewalls.
You can find out more about WebDAV at http://www.webdav.org/
- So much confusion
2003-08-06 04:08:00 Rich Salz [Reply]
Thanks for the detailed analysis. A couple of corrections: since wiki was all about "rss replacement" my claimed confusion in #3 is sensible; if the API is really about remote content management, than WebDav requires even more consideration than I thoguth. :) As for #6, no, I don't have that confusion.
Dangerous? Wow.
- no one uses webDAV?!!
2003-08-06 01:44:26 bryan rasmussen [Reply]
c'mon Exchange uses webDAV, windows 2000+ uses webDAV (to go on with Microsoft stuff most of their big products support it, Sharepoint etc.), Apache has a webDAV module (not full implementation if I remember correctly), most native xml databases that I'm aware of support some sort of webDAV interface, a lot of programming languages have webDAV libraries... I could go on with a list of major players supporting webDAV that would probably be as long as the original article, I counted on www.webdav.org/projects over 20 open source webDAV projects, and over 20 commercial products. As I am also aware of numerous products not listed on this list I am forced to conclude that, whoa, this webDAV thing is all over the place!
- Nobody uses WebDAV?
- Thanks for the feedback
2003-08-05 22:37:00 Joe Gregorio [Reply]
Rich,
Thanks for the feedback on the draft RFC, and also for stopping by and leaving the feedback on the wiki. The concensus, between the mailing list and the wiki appears to be that the RSD file does introduce too many levels of indirection and so it will be removed in the 07 draft.
- Lost properties
2003-08-05 18:51:23 Mark Baker [Reply]
Using an RPC approach as you advocate carries with it all the poor architectural properties that RPC has always had, and will continue to have; poor visibility, poor scalability, poor self-description, poor reliability. While using REST is harder, because it is more constraining, you really need to consider the cost of those lost properties before you can suggest an alternative architectural style. Yes, it's easy to pick nits and find cases where RPC has a small advantage, but that doesn't in any way suggest that RPC is the best style for the system as a whole. On the contrary, I think RPC has demonstrated quite well over the past 20+ years that it's entirely unsuitable for the Internet, whereas REST has demonstrated the exact opposite.
P.S. I'm not a stalker, really. 8-)
- Lost properties
2003-08-05 19:09:03 Rich Salz [Reply]
Given the constraints of the column format, I think the "better approach" section makes a fair and accurate assessment of why REST isn't great. After all, the introspection file is just run-time WSDL.
- Lost properties
2003-08-05 20:05:36 Mark Baker [Reply]
No, the introspection file declares resource types, while WSDL declares interfaces.
- Lost properties
2003-08-05 20:17:41 Rich Salz [Reply]
Introspection says "post this kind of message here", where the message type is identified by the element name in the file, but defined out of band. "Post a resource" vs "post a schema-defined document" -- to me, the difference is theoretical; the cost described in the column. Time for me, I think, to give this debate a rest.
- Lost properties
2003-08-06 08:48:56 Mark Baker [Reply]
If you can't see the difference, or don't see it as important, then you're either not looking hard enough, or you don't understand the importance of constraints to software architecture.
- Lost properties
- Lost properties
- Lost properties
- Lost properties

