An Atom-Powered Wiki
In my last article I covered the changes from version 7 to version 8 of the draft AtomAPI. Now the latest version of the AtomAPI is version 9 which adds support for SOAP. This change, and its impact on API implementers, will be covered in a future article. In this article I'm going to build a simple implementation of the AtomAPI.
The first task at hand is to pick a viable candidate. I had a list of criteria which included working with a small code base, working in Python, and the target also being a slightly unconventional application of the AtomAPI. The reason I wanted a small code base in Python is that it's a language I'm familiar with, and small is good for the sake of exposition. The reason I picked an unconventional application of the AtomAPI is that I've found that to be a good technique for stretching a protocol, looking for strengths and weaknesses.
The application I've picked is PikiPiki, which is a wiki, a cooperative authoring system for the Web. It's written in Python, is GPL'd, has a small code base, and the code is easy to navigate. It also has a good lineage given that MoinMoin is based on PikiPiki. The source for both the client and the modified server described in this article can be downloaded from the EditableWebWiki.
To create an implementation of the AtomAPI there are a few operations we need to support. Each entry, which in the case of a wiki will be the content for a WikiWord, needs to have a unique URI called the EditURI that supports GET, PUT and DELETE. In addition a single PostURI that accepts POST to create new entries needs to be added. Last we'll add a FeedURI that supports GET to return a list of the entries. Supporting the listed operations on these URIs is all that's needed to have a fully functioning Atom server. (This of course ignores SOAP, which I'll cover later.)
Character encoding is often overlooked. Despite that it's an important part of working with any XML format. Atom is no exception. Before making any additions to PikiPiki we'll need to make a few small changes to ensure that all of our data is encoded correctly. For a good introduction to character encoding consult the excellent introduction by Jukka Korpela.
To make things easier we can encode all of PikiPiki's data as UTF-8. There are many encoding to choose from, all with different advantages and disadvantages; but UTF-8 has some special properties: it allows us to use any Unicode character, for the most part treats the data like regular "C" strings, and we are guaranteed support by any conforming XML parser. Also, support for UTF-8 is one of the few things that most browsers do right.
Since this is a wiki, and for now all the data coming into it comes
through a form, we need to ensure that all incoming data is
encoded as UTF-8. The easiest way to do this is by specifying that
the encoding for form page is UTF-8; lacking any other
indications, a browser will submit the data from a form using the
same character encoding that the page is served in. While HTML
forms can specify alternate character sets that the server will
accept when data is submitted, via the accept-charset
attribute, support for this is spotty (meaning it worked perfectly
in Mozilla, and I failed to get it working in Microsoft's Internet
Explorer). So our first change to PikiPiki is to add
meta tag to the generated HTML.
def send_title(text, link=None, msg=None, wikiword=None): print "<head><title>%s</title>" % text print '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">'
Now all of our web pages should submit UTF-8 encoded data, and since all of the web pages produced from the wiki are combinations of ascii markup embedded in the Python program and the UTF-8 in the stored wiki entries, we can be sure our output is UTF-8.
A Wiki revolves around WikiWords, mixed-case words that are the title for and unique identifiers of every page on the wiki. In the case of PikiPiki, the WikiWord is also the filename that the text of the page is stored in.
The next change is to move the configuration of PikiPiki into a
separatefile. We'll be creating two new CGI programs to handle the
AtomAPI, and they both need access to some configuration
information. The configuration section is just a set of global
variables that we'll move into
from os import path import cgi data_dir = '/home/myuserpath/piki.bitworking.org/' text_dir = path.join(data_dir, 'text') editlog_name = path.join(data_dir, 'editlog') cgi.logfile = path.join(data_dir, 'cgi_log') logo_string = '<img src="/piki/pikipiki-logo.png" border=0 alt="pikipiki">' changed_time_fmt = ' . . . . [%I:%M %p]' date_fmt = '%a %d %b %Y' datetime_fmt = '%a %d %b %Y %I:%M %p' show_hosts = 0 css_url = '/piki/piki.css' nonexist_qm = 0
The next task at hand is to handle the functions of the EditURI.
In the AtomAPI each entry has an associated EditURI, a URI you can
dereference in order to retrieve the representation of the
entry. You can also PUT an Atom entry to the EditURI to update the
entry. In this case, each definition of a WikiWord in PikiPiki
will act as a single entry. To handle the EditURI functions we'll
create a Python script
First let's map out the GET. We need to package up the UTF-8
encoded contents of a WikiWord and send it back. We need to decide
on the form of the URI we are going to use. In this case we are
going to be calling a CGI program and need to pass in the WikiWord
as a parameter. We could pass it in either as a query parameter or
we could pass it in as a sort of path. For example, in the first
case, if the WikiWord was "FrontPage", the EditURI could be
atom.cgi?wikiword=FrontPage. In the second place,
the EditURI might be
atom.cgi/FrontPage. Well choose
the latter; the WikiWord will be passed in via the "PATH_INFO"
def main(body): method = os.environ.get('REQUEST_METHOD', '') wikiword = os.environ.get('PATH_INFO', '/') wikiword = wikiword.split("/", 1) wikiword = wikiword.strip() word_anchored_re = re.compile(WIKIWORD_RE) if method == 'POST': ret = create_atom_entry(body) elif word_anchored_re.match(wikiword): if method in ['GET', 'HEAD']: ret = get_atom_entry(wikiword) elif method == 'PUT': ret = put_atom_entry(wikiword, body) elif method == 'DELETE': ret = delete_atom_entry(wikiword) else: ret = report_status(405, "Method not allowed", "") else: ret = report_status(400, "Not a valid WikiWord", "The WikiWord you referred to is invalid.") return ret
Our CGI pulls the HTTP method from the environment variable "REQUEST_METHOD" and the WikiWord from the "PATH_INFO" environment variable. Based on those two pieces of information we dispatch to the correct function. When we process GET we also are careful to respond to HEAD requests too. This is an important point, as the Apache web server will do the right thing with the HEAD response, that is, generate the right headers and send only the headers, discarding the body.
def get_atom_entry(wikiword): filename = getpath(wikiword) base_uri = piki_conf.base_uri if path.exists(filename): issued = last_modified_iso(filename) content = file(filename, 'r').read() else: issued = currentISOTime() content = "Create this page." return (200, ENTRY_FORM % vars())
ENTRY_FORM is defined as:
"""Content-type: application/atom+xml; charset=utf-8 Status: 200 Ok <?xml version="1.0" encoding='utf-8'?> <entry xmlns="http://purl.org/atom/ns#"> <title>%(wikiword)s</title> <link rel="alternate" type="text/html" href="%(base_uri)s/%(wikiword)s" /> <id>tag:dev.bitworking.org,2004:%(wikiword)s</id> <issued>%(issued)s</issued> <content type="text/plain">%(content)s</content> </entry>"""
There are two important points to note about this code. The first
is what we do if the desired WikiWord does not exist. If we were
writing this for a typical CMS, for a GET for an entry that didn't exist we
would normally return with a status code of 404. Wikis, in
contrast, when dealing with the HTML content, present what appears
to be an infinite URI space. That is, you can request any URI at
a wiki and, as long as you specify a validly formed WikiWord, you
won't get a 404. Instead you will get a web page that prompts you
to enter the content for that WikiWord. Go ahead and try it on
the PikiPiki wiki that is setup for testing this implementation of
the AtomAPI. This WikiWord currently doesn't have a definition: http://piki.bitworking.org/piki.cgi/SomeWikiWordThatDoesntExist
To keep parity with the HTML interface, the AtomAPI interface
works the same way.
The second point is character encoding. Note that we state
character encoding in two places in the response, both in the HTTP
Content-type: and in the XML Declaration.
There are two more HTTP methods to handle for the EditURI, DELETE and PUT. PUT is used to update the content for a WikiWord, replacing the existing content with that delivered by the PUT. DELETE is used to remove an entry; it's easy to implement: just delete the associated file.
def delete_atom_entry(wikiword): ret = report_status(200, "OK", "Delete successful.") if wikiwordExists(wikiword): try: os.unlink(getpath(wikiword)) except: ret = report_status(500, "Internal Server Error", "Can't remove the file associated with that word.") return ret
Note that unless something really bad happens, we return with a status code of 200 OK. That is, if the entry doesn't exist then we still return 200. You might be scratching your head if you remember we just talked about our implementation always returning an entry for every valid WikiWord, whether or not it actually had filled in content. That is, if you come right back and do a GET on the URI we just DELETE'd, it will not give you a 404, but instead will return the default filled in entry, "Create this page". Is this a problem? No. It may seem a bit odd, but it's not a problem at all. DELETE and GET are two different, orthogonal requests. There is no guarantee that some other agent, or some process on the server itself, didn't come along and recreate that URI between the DELETE and the GET.
Supporting PUT allows us to change the content of a WikiWord. To
make the handling of XML easier I've used the Python wrapper
for libxml2, an excellent
tool for handling XML, in particular because it let's you use XPath expressions to query
XML documents. In this case we're using them to pull out
def put_atom_entry(wikiword, content): ret = report_status(200, "OK", "Entry successfully updated.") doc = libxml2.parseDoc(content) ctxt = doc.xpathNewContext() ctxt.xpathRegisterNs('atom', 'http://purl.org/atom/ns#') text_plain_content_nodes = ctxt.xpathEval( '/atom:entry/atom:content[@type="text/plain" or not(@type)]' ) all_content_nodes = ctxt.xpathEval('/atom:entry/atom:content') content = "" if len(text_plain_content_nodes) > 0: content = text_plain_content_nodes.content if len(text_plain_content_nodes) > 0 or len(all_content_nodes) == 0: writeWordDef(wikiword, content) append_editlog(wikiword, os.environ.get('REMOTE_ADDR', '')) else: # If there are 'content' elements but of some unknown type ret = report_status(415, "Unsupported Media Type", "This wiki only supports plain texti") return ret
The detail to notice in the implementation is the XPath used to pick
content element. Content elements may have a 'type'
attribute, but if it is not present then it defaults to 'text/plain'.
Since 'text/plain' is the only type of content we can support in a wiki,
it's the only type of content we'll look for.
That takes care of the EntryURI; we just have the PostURI and FeedURI to go.
Pages: 1, 2