Menu

An Atom-Powered Wiki

April 14, 2004

Joe Gregorio

In my last article I covered the changes from version 7 to version 8 of the draft AtomAPI. Now the latest version of the AtomAPI is version 9 which adds support for SOAP. This change, and its impact on API implementers, will be covered in a future article. In this article I'm going to build a simple implementation of the AtomAPI.

The first task at hand is to pick a viable candidate. I had a list of criteria which included working with a small code base, working in Python, and the target also being a slightly unconventional application of the AtomAPI. The reason I wanted a small code base in Python is that it's a language I'm familiar with, and small is good for the sake of exposition. The reason I picked an unconventional application of the AtomAPI is that I've found that to be a good technique for stretching a protocol, looking for strengths and weaknesses.

The application I've picked is PikiPiki, which is a wiki, a cooperative authoring system for the Web. It's written in Python, is GPL'd, has a small code base, and the code is easy to navigate. It also has a good lineage given that MoinMoin is based on PikiPiki. The source for both the client and the modified server described in this article can be downloaded from the EditableWebWiki.

To create an implementation of the AtomAPI there are a few operations we need to support. Each entry, which in the case of a wiki will be the content for a WikiWord, needs to have a unique URI called the EditURI that supports GET, PUT and DELETE. In addition a single PostURI that accepts POST to create new entries needs to be added. Last we'll add a FeedURI that supports GET to return a list of the entries. Supporting the listed operations on these URIs is all that's needed to have a fully functioning Atom server. (This of course ignores SOAP, which I'll cover later.)

Character Encoding

Character encoding is often overlooked. Despite that it's an important part of working with any XML format. Atom is no exception. Before making any additions to PikiPiki we'll need to make a few small changes to ensure that all of our data is encoded correctly. For a good introduction to character encoding consult the excellent introduction by Jukka Korpela.

To make things easier we can encode all of PikiPiki's data as UTF-8. There are many encoding to choose from, all with different advantages and disadvantages; but UTF-8 has some special properties: it allows us to use any Unicode character, for the most part treats the data like regular "C" strings, and we are guaranteed support by any conforming XML parser. Also, support for UTF-8 is one of the few things that most browsers do right.

Since this is a wiki, and for now all the data coming into it comes through a form, we need to ensure that all incoming data is encoded as UTF-8. The easiest way to do this is by specifying that the encoding for form page is UTF-8; lacking any other indications, a browser will submit the data from a form using the same character encoding that the page is served in. While HTML forms can specify alternate character sets that the server will accept when data is submitted, via the accept-charset attribute, support for this is spotty (meaning it worked perfectly in Mozilla, and I failed to get it working in Microsoft's Internet Explorer). So our first change to PikiPiki is to add a meta tag to the generated HTML.

def send_title(text, link=None, msg=None, wikiword=None):

  print "<head><title>%s</title>" % text

  print '<meta http-equiv="Content-Type" 

      content="text/html; charset=utf-8">'

         

Now all of our web pages should submit UTF-8 encoded data, and since all of the web pages produced from the wiki are combinations of ascii markup embedded in the Python program and the UTF-8 in the stored wiki entries, we can be sure our output is UTF-8.

A Wiki revolves around WikiWords, mixed-case words that are the title for and unique identifiers of every page on the wiki. In the case of PikiPiki, the WikiWord is also the filename that the text of the page is stored in.

The next change is to move the configuration of PikiPiki into a separatefile. We'll be creating two new CGI programs to handle the AtomAPI, and they both need access to some configuration information. The configuration section is just a set of global variables that we'll move into piki_conf.py:

from os import path

import cgi



data_dir = '/home/myuserpath/piki.bitworking.org/'

text_dir = path.join(data_dir, 'text')

editlog_name = path.join(data_dir, 'editlog')

cgi.logfile = path.join(data_dir, 'cgi_log')

logo_string = '<img src="/piki/pikipiki-logo.png" border=0 alt="pikipiki">'

changed_time_fmt = ' . . . . [%I:%M %p]'

date_fmt = '%a %d %b %Y'

datetime_fmt = '%a %d %b %Y %I:%M %p'

show_hosts = 0                   

css_url = '/piki/piki.css'       

nonexist_qm = 0                  

EditURI

The next task at hand is to handle the functions of the EditURI. In the AtomAPI each entry has an associated EditURI, a URI you can dereference in order to retrieve the representation of the entry. You can also PUT an Atom entry to the EditURI to update the entry. In this case, each definition of a WikiWord in PikiPiki will act as a single entry. To handle the EditURI functions we'll create a Python script atom.cgi.

First let's map out the GET. We need to package up the UTF-8 encoded contents of a WikiWord and send it back. We need to decide on the form of the URI we are going to use. In this case we are going to be calling a CGI program and need to pass in the WikiWord as a parameter. We could pass it in either as a query parameter or we could pass it in as a sort of path. For example, in the first case, if the WikiWord was "FrontPage", the EditURI could be atom.cgi?wikiword=FrontPage. In the second place, the EditURI might be atom.cgi/FrontPage. Well choose the latter; the WikiWord will be passed in via the "PATH_INFO" environment variable.

def main(body):

  method = os.environ.get('REQUEST_METHOD', '')

  wikiword = os.environ.get('PATH_INFO', '/')

  wikiword = wikiword.split("/", 1)[1]      

  wikiword = wikiword.strip()



  word_anchored_re = re.compile(WIKIWORD_RE)



  if method == 'POST':

    ret = create_atom_entry(body)

  elif word_anchored_re.match(wikiword): 

    if method in ['GET', 'HEAD']:

      ret = get_atom_entry(wikiword)

    elif method == 'PUT':

      ret = put_atom_entry(wikiword, body)

    elif method == 'DELETE':

      ret = delete_atom_entry(wikiword)

    else:

      ret = report_status(405, 

        "Method not allowed", "")

  else:

    ret = report_status(400, "Not a valid WikiWord", 

      "The WikiWord you referred to is invalid.")

  return ret[1]

Our CGI pulls the HTTP method from the environment variable "REQUEST_METHOD" and the WikiWord from the "PATH_INFO" environment variable. Based on those two pieces of information we dispatch to the correct function. When we process GET we also are careful to respond to HEAD requests too. This is an important point, as the Apache web server will do the right thing with the HEAD response, that is, generate the right headers and send only the headers, discarding the body.

def get_atom_entry(wikiword):

  filename = getpath(wikiword)

  base_uri = piki_conf.base_uri

  if path.exists(filename):

    issued = last_modified_iso(filename)

    content = file(filename, 'r').read()

  else:

    issued = currentISOTime()

    content = "Create this page."

  return (200, ENTRY_FORM % vars())

  

Where ENTRY_FORM is defined as:

"""Content-type: application/atom+xml; charset=utf-8

Status: 200 Ok



<?xml version="1.0" encoding='utf-8'?>

<entry xmlns="http://purl.org/atom/ns#">

    <title>%(wikiword)s</title>

    <link rel="alternate" type="text/html" 

        href="%(base_uri)s/%(wikiword)s" />

    <id>tag:dev.bitworking.org,2004:%(wikiword)s</id>

    <issued>%(issued)s</issued>

    <content type="text/plain">%(content)s</content>

</entry>"""

There are two important points to note about this code. The first is what we do if the desired WikiWord does not exist. If we were writing this for a typical CMS, for a GET for an entry that didn't exist we would normally return with a status code of 404. Wikis, in contrast, when dealing with the HTML content, present what appears to be an infinite URI space. That is, you can request any URI at a wiki and, as long as you specify a validly formed WikiWord, you won't get a 404. Instead you will get a web page that prompts you to enter the content for that WikiWord. Go ahead and try it on the PikiPiki wiki that is setup for testing this implementation of the AtomAPI. This WikiWord currently doesn't have a definition: http://piki.bitworking.org/piki.cgi/SomeWikiWordThatDoesntExist. To keep parity with the HTML interface, the AtomAPI interface works the same way.

The second point is character encoding. Note that we state character encoding in two places in the response, both in the HTTP header Content-type: and in the XML Declaration.

There are two more HTTP methods to handle for the EditURI, DELETE and PUT. PUT is used to update the content for a WikiWord, replacing the existing content with that delivered by the PUT. DELETE is used to remove an entry; it's easy to implement: just delete the associated file.

def delete_atom_entry(wikiword):

  ret = report_status(200, "OK", "Delete successful.")

  if wikiwordExists(wikiword):

    try:

      os.unlink(getpath(wikiword))

    except:

      ret = report_status(500, "Internal Server Error", 

        "Can't remove the file associated with that word.")

  return ret

Note that unless something really bad happens, we return with a status code of 200 OK. That is, if the entry doesn't exist then we still return 200. You might be scratching your head if you remember we just talked about our implementation always returning an entry for every valid WikiWord, whether or not it actually had filled in content. That is, if you come right back and do a GET on the URI we just DELETE'd, it will not give you a 404, but instead will return the default filled in entry, "Create this page". Is this a problem? No. It may seem a bit odd, but it's not a problem at all. DELETE and GET are two different, orthogonal requests. There is no guarantee that some other agent, or some process on the server itself, didn't come along and recreate that URI between the DELETE and the GET.

Supporting PUT allows us to change the content of a WikiWord. To make the handling of XML easier I've used the Python wrapper for libxml2, an excellent tool for handling XML, in particular because it let's you use XPath expressions to query XML documents. In this case we're using them to pull out the content element.

def put_atom_entry(wikiword, content):

  ret = report_status(200, "OK", 

    "Entry successfully updated.")

  doc = libxml2.parseDoc(content)

  ctxt = doc.xpathNewContext()

  ctxt.xpathRegisterNs('atom', 'http://purl.org/atom/ns#')

  text_plain_content_nodes = ctxt.xpathEval(

    '/atom:entry/atom:content[@type="text/plain" or not(@type)]'

  )

  all_content_nodes = ctxt.xpathEval('/atom:entry/atom:content')



  content = ""

  if len(text_plain_content_nodes) > 0:

    content = text_plain_content_nodes[0].content



  if len(text_plain_content_nodes) > 0 or len(all_content_nodes) == 0:

    writeWordDef(wikiword, content)

    append_editlog(wikiword, os.environ.get('REMOTE_ADDR', ''))

  else:

    # If there are 'content' elements but of some unknown type

    ret = report_status(415, "Unsupported Media Type", 

      "This wiki only supports plain texti")



  return ret

The detail to notice in the implementation is the XPath used to pick out the content element. Content elements may have a 'type' attribute, but if it is not present then it defaults to 'text/plain'. Since 'text/plain' is the only type of content we can support in a wiki, it's the only type of content we'll look for.

That takes care of the EntryURI; we just have the PostURI and FeedURI to go.

PostURI

The PostURI is used for creating new WikiWord entries.

def create_atom_entry(body):

  wikiword = extractWikiWord(body)

  if wikiword:

    if wikiwordExists(wikiword):

      ret = report_status(409, "Conflict", 

        "An entry with that name already exists.")

    else:

      ret = put_atom_entry(wikiword, body)

    if (ret[0] == 200):

      ret = (201, CREATED_RESP % 

        {'base_uri': base_uri, 

         'atom_base_uri': atom_base_uri, 

         'wikiword': wikiword

        })

  else:

    ret = report_status(409, "Conflict", 

      "Not enough information to form a wiki word.")

  return ret

The function 'extractWikiWord' pulls out the contents of the title element and converts it into a WikiWord. If we have a good WikiWord and it doesn't already exist, then we use 'put_atom_entry' to create it. Otherwise we respond with an HTTP status code of 409 to indicate that we won't let a POST overwrite an already existing WikiWord.

FeedURI

The FeedURI is the last piece we need to implement. The FeedURI is used by clients to locate the PostURI for creating new entries and the EditURIs for editing each entry. The format of the FeedURI is exactly that of an Atom feed. This is different from the Atom we use with the PostURI and the EditURI, which is just the 'entry' element from Atom. Since the format of the FeedURI is the same as that for a regular feed, you might be tempted to have the same feed for both aggregation and editing. This might work in the case of wiki but not for a general site. The reason is that you may have entries in draft or unpublished form which must appear at the FeedURI so you can edit them, but must not appear in your aggregation feed. Given that this is for a publicly editable wiki, we don't have such a constraint so we can use this feed for both purposes.

The FeedURI is implemented as a separate script, atomfeed.cgi, that builds a feed. The code, which is bit too long to include here, builds an Atom feed by sorting all the files that contain WikiWord definitions in reverse chronological order, then takes the WikiWord and associated content, and formats it in an Atom entry. The entries are concatenated together and placed in an Atom feed. The only special additions are the link elements that contain the PostURI and the EditURIs, which are denoted with attributes rel="service.post" and rel="service.edit" respectively. Here is a snippet from the Atom feed produced by atomfeed.cgi.

<?xml version="1.0" encoding="utf-8"?>

<feed version="0.3" xmlns="http://purl.org/atom/ns#">

  <title>PikiPiki</title>

  <link rel="alternate" type="text/html" 

     href="http:/.bitworking.org.cgi"/>

  <link rel="service.post" type="application/atom+xml" 

      href="http:/.bitworking.org/atom.cgi"/>

  <link rel="next" type="application/atom+xml" 

      href="http:/.bitworking.org/atomfeed.cgi/10"/>

  

  <modified>2004-03-09T21:32:58-05:00</modified>

  <author>

      <name>Joe Gregorio</name>

      <url>http://bitworking.org/</url>

  </author> 

  <entry>

    <title>JustTesting</title> 

    <link rel="service.edit" type="application/atom+xml" 

        href="http:/.bitworking.org/atom.cgi/JustTesting" />

    <link rel="alternate" type="text/html" 

        href="http:/.bitworking.org.cgi/JustTesting" />

    <id>tag:piki.bitworking.org,2004:JustTesting</id>

    <issued>2004-03-09T21:32:58-05:00</issued> 

    <modified>2004-03-09T21:32:58-05:00</modified> 

    <content type="text/plain">

      This is content posted from an AtomAPI client.

    </content>  

  </entry>

  <entry>

    <title>PikiSandBox</title> 

    <link rel="service.edit" type="application/atom+xml" 

      href="http:/.bitworking.org/atom.cgi/PikiSandBox" />

    <link rel="alternate" type="text/html" 

      href="http:/.bitworking.org.cgi/PikiSandBox" />

    <id>tag:piki.bitworking.org,2004:PikiSandBox</id>

    <issued>2004-03-04T21:49:03-05:00</issued> 

    <modified>2004-03-04T21:49:03-05:00</modified> 

    <content type="text/plain">

      '''I dare you''': press the Edit button and add 

      something to this page. 



    -- MartinPool

    </content>  

    </entry>

 

This feed also contains one more link element of a type we haven't talked about yet. The second link, the one with rel="next", points to the next set of entries. That is, when we produce a FeedURI you don't want to put all the entries into a single feed. That could end up being hundreds if not thousands of entries which would be impractical to handle. Instead put in a fixed number, like 20, and then the 'next' link points to another feed, with the next 20 entries. If a feed is in the middle of such a chain then it also contains a link with rel="prev" which points to the set of entries previous to the current one. In this way clients can navigate around the list of entries in manageable sized sets. It should be noted here that the client code that comes with this implementation does not implement traversing 'next' and 'prev' links in a feed.

The Client

An AtomAPI enabled wiki wouldn't be worth much if there wasn't a client available, so I've included a wxPython client that allows you to create new entries on the wiki and to edit old entries.

Remember how careful we were when specifying and using the character encoding? There isn't much code involved in supporting and processing everything in UTF-8, but careful planning ahead pays dividends. Here is a screenshot of the client editing one of the pages on a wiki with some unicode characters in it:

Screenshot of the 'wxAtomClient' application editing an entry full of chess characters

All of the source for both the client and the server can be downloaded from the EditableWebWiki, which is running the code described above. Note that the client is a GUI application written in Python. You must use the version of wxPython that is compiled with Unicode support. Lastly, for your platform you'll have to ensure that you have fonts available to display the Unicode characters you are going to be using.

Rough Spots

One of the reasons we started using the AtomAPI on a wiki was to stretch the API and see where things broke down. Nothing really awful showed up, though we did find some rough spots. The first rough spot cropped up when doing a GET on the EditURI, where we encounter a slight mismatch between the formulation of the AtomAPI and this wiki implementation. The problem is that according to version 9 the draft AtomAPI, when doing a GET on an EditURI, the issued element is required. Since PikiPiki only stores the raw contents in a file, and doesn't store any other data, we are limited to using the last modified date stored in the file system for each file, which isn't the same as the issued element.

The second rough spot is in the area of content. The only only type of content we accept is 'text/plain', but that isn't the only type of content that a client could post. In fact, most may be able to produce 'text/html' and some may even be able to produce 'application/xhtml+xml'. Now we may be able to add code to this implementation to convert HTML into WikiML, but the broader question still stands: how does a client know what kinds of content, i.e. which mime-types, an AtomAPI server will accept? This is an open question as of today.

Summary

Using Python and the XPath facilities of libxml2, it was straightforward to build an AtomAPI implementation for a wiki. There isn't even very much code: atom.cgi is just 146 lines of code, while atomfeed.cgi is just 122 lines.

This is just a basic client that does the minimum to support the AtomAPI. In a future article the way the server handles HTTP can be enhanced to provide significant performance boosts by using the full capabilities of HTTP. In addition, the SOAP enabling of the server will require some changes. After that we can add the ability to edit the wiki's templates.