An Atom-Powered Wiki
In my last article I covered the changes from version 7 to version 8 of the draft AtomAPI. Now the latest version of the AtomAPI is version 9 which adds support for SOAP. This change, and its impact on API implementers, will be covered in a future article. In this article I'm going to build a simple implementation of the AtomAPI.
The first task at hand is to pick a viable candidate. I had a list of criteria which included working with a small code base, working in Python, and the target also being a slightly unconventional application of the AtomAPI. The reason I wanted a small code base in Python is that it's a language I'm familiar with, and small is good for the sake of exposition. The reason I picked an unconventional application of the AtomAPI is that I've found that to be a good technique for stretching a protocol, looking for strengths and weaknesses.
The application I've picked is PikiPiki, which is a wiki, a cooperative authoring system for the Web. It's written in Python, is GPL'd, has a small code base, and the code is easy to navigate. It also has a good lineage given that MoinMoin is based on PikiPiki. The source for both the client and the modified server described in this article can be downloaded from the EditableWebWiki.
To create an implementation of the AtomAPI there are a few operations we need to support. Each entry, which in the case of a wiki will be the content for a WikiWord, needs to have a unique URI called the EditURI that supports GET, PUT and DELETE. In addition a single PostURI that accepts POST to create new entries needs to be added. Last we'll add a FeedURI that supports GET to return a list of the entries. Supporting the listed operations on these URIs is all that's needed to have a fully functioning Atom server. (This of course ignores SOAP, which I'll cover later.)
Character encoding is often overlooked. Despite that it's an important part of working with any XML format. Atom is no exception. Before making any additions to PikiPiki we'll need to make a few small changes to ensure that all of our data is encoded correctly. For a good introduction to character encoding consult the excellent introduction by Jukka Korpela.
To make things easier we can encode all of PikiPiki's data as UTF-8. There are many encoding to choose from, all with different advantages and disadvantages; but UTF-8 has some special properties: it allows us to use any Unicode character, for the most part treats the data like regular "C" strings, and we are guaranteed support by any conforming XML parser. Also, support for UTF-8 is one of the few things that most browsers do right.
Since this is a wiki, and for now all the data coming into it comes
through a form, we need to ensure that all incoming data is
encoded as UTF-8. The easiest way to do this is by specifying that
the encoding for form page is UTF-8; lacking any other
indications, a browser will submit the data from a form using the
same character encoding that the page is served in. While HTML
forms can specify alternate character sets that the server will
accept when data is submitted, via the accept-charset
attribute, support for this is spotty (meaning it worked perfectly
in Mozilla, and I failed to get it working in Microsoft's Internet
Explorer). So our first change to PikiPiki is to add
a meta tag to the generated HTML.
def send_title(text, link=None, msg=None, wikiword=None):
print "<head><title>%s</title>" % text
print '<meta http-equiv="Content-Type"
content="text/html; charset=utf-8">'
Now all of our web pages should submit UTF-8 encoded data, and since all of the web pages produced from the wiki are combinations of ascii markup embedded in the Python program and the UTF-8 in the stored wiki entries, we can be sure our output is UTF-8.
A Wiki revolves around WikiWords, mixed-case words that are the title for and unique identifiers of every page on the wiki. In the case of PikiPiki, the WikiWord is also the filename that the text of the page is stored in.
The next change is to move the configuration of PikiPiki into a
separatefile. We'll be creating two new CGI programs to handle the
AtomAPI, and they both need access to some configuration
information. The configuration section is just a set of global
variables that we'll move into piki_conf.py:
from os import path
import cgi
data_dir = '/home/myuserpath/piki.bitworking.org/'
text_dir = path.join(data_dir, 'text')
editlog_name = path.join(data_dir, 'editlog')
cgi.logfile = path.join(data_dir, 'cgi_log')
logo_string = '<img src="/piki/pikipiki-logo.png" border=0 alt="pikipiki">'
changed_time_fmt = ' . . . . [%I:%M %p]'
date_fmt = '%a %d %b %Y'
datetime_fmt = '%a %d %b %Y %I:%M %p'
show_hosts = 0
css_url = '/piki/piki.css'
nonexist_qm = 0
The next task at hand is to handle the functions of the EditURI.
In the AtomAPI each entry has an associated EditURI, a URI you can
dereference in order to retrieve the representation of the
entry. You can also PUT an Atom entry to the EditURI to update the
entry. In this case, each definition of a WikiWord in PikiPiki
will act as a single entry. To handle the EditURI functions we'll
create a Python script atom.cgi.
First let's map out the GET. We need to package up the UTF-8
encoded contents of a WikiWord and send it back. We need to decide
on the form of the URI we are going to use. In this case we are
going to be calling a CGI program and need to pass in the WikiWord
as a parameter. We could pass it in either as a query parameter or
we could pass it in as a sort of path. For example, in the first
case, if the WikiWord was "FrontPage", the EditURI could be
atom.cgi?wikiword=FrontPage. In the second place,
the EditURI might be atom.cgi/FrontPage. Well choose
the latter; the WikiWord will be passed in via the "PATH_INFO"
environment variable.
def main(body):
method = os.environ.get('REQUEST_METHOD', '')
wikiword = os.environ.get('PATH_INFO', '/')
wikiword = wikiword.split("/", 1)[1]
wikiword = wikiword.strip()
word_anchored_re = re.compile(WIKIWORD_RE)
if method == 'POST':
ret = create_atom_entry(body)
elif word_anchored_re.match(wikiword):
if method in ['GET', 'HEAD']:
ret = get_atom_entry(wikiword)
elif method == 'PUT':
ret = put_atom_entry(wikiword, body)
elif method == 'DELETE':
ret = delete_atom_entry(wikiword)
else:
ret = report_status(405,
"Method not allowed", "")
else:
ret = report_status(400, "Not a valid WikiWord",
"The WikiWord you referred to is invalid.")
return ret[1]
Our CGI pulls the HTTP method from the environment variable "REQUEST_METHOD" and the WikiWord from the "PATH_INFO" environment variable. Based on those two pieces of information we dispatch to the correct function. When we process GET we also are careful to respond to HEAD requests too. This is an important point, as the Apache web server will do the right thing with the HEAD response, that is, generate the right headers and send only the headers, discarding the body.
def get_atom_entry(wikiword):
filename = getpath(wikiword)
base_uri = piki_conf.base_uri
if path.exists(filename):
issued = last_modified_iso(filename)
content = file(filename, 'r').read()
else:
issued = currentISOTime()
content = "Create this page."
return (200, ENTRY_FORM % vars())
Where ENTRY_FORM is defined as:
"""Content-type: application/atom+xml; charset=utf-8
Status: 200 Ok
<?xml version="1.0" encoding='utf-8'?>
<entry xmlns="http://purl.org/atom/ns#">
<title>%(wikiword)s</title>
<link rel="alternate" type="text/html"
href="%(base_uri)s/%(wikiword)s" />
<id>tag:dev.bitworking.org,2004:%(wikiword)s</id>
<issued>%(issued)s</issued>
<content type="text/plain">%(content)s</content>
</entry>"""
There are two important points to note about this code. The first
is what we do if the desired WikiWord does not exist. If we were
writing this for a typical CMS, for a GET for an entry that didn't exist we
would normally return with a status code of 404. Wikis, in
contrast, when dealing with the HTML content, present what appears
to be an infinite URI space. That is, you can request any URI at
a wiki and, as long as you specify a validly formed WikiWord, you
won't get a 404. Instead you will get a web page that prompts you
to enter the content for that WikiWord. Go ahead and try it on
the PikiPiki wiki that is setup for testing this implementation of
the AtomAPI. This WikiWord currently doesn't have a definition: http://piki.bitworking.org/piki.cgi/SomeWikiWordThatDoesntExist.
To keep parity with the HTML interface, the AtomAPI interface
works the same way.
The second point is character encoding. Note that we state
character encoding in two places in the response, both in the HTTP
header Content-type: and in the XML Declaration.
There are two more HTTP methods to handle for the EditURI, DELETE and PUT. PUT is used to update the content for a WikiWord, replacing the existing content with that delivered by the PUT. DELETE is used to remove an entry; it's easy to implement: just delete the associated file.
def delete_atom_entry(wikiword):
ret = report_status(200, "OK", "Delete successful.")
if wikiwordExists(wikiword):
try:
os.unlink(getpath(wikiword))
except:
ret = report_status(500, "Internal Server Error",
"Can't remove the file associated with that word.")
return ret
Note that unless something really bad happens, we return with a status code of 200 OK. That is, if the entry doesn't exist then we still return 200. You might be scratching your head if you remember we just talked about our implementation always returning an entry for every valid WikiWord, whether or not it actually had filled in content. That is, if you come right back and do a GET on the URI we just DELETE'd, it will not give you a 404, but instead will return the default filled in entry, "Create this page". Is this a problem? No. It may seem a bit odd, but it's not a problem at all. DELETE and GET are two different, orthogonal requests. There is no guarantee that some other agent, or some process on the server itself, didn't come along and recreate that URI between the DELETE and the GET.
Supporting PUT allows us to change the content of a WikiWord. To
make the handling of XML easier I've used the Python wrapper
for libxml2, an excellent
tool for handling XML, in particular because it let's you use XPath expressions to query
XML documents. In this case we're using them to pull out
the content element.
def put_atom_entry(wikiword, content):
ret = report_status(200, "OK",
"Entry successfully updated.")
doc = libxml2.parseDoc(content)
ctxt = doc.xpathNewContext()
ctxt.xpathRegisterNs('atom', 'http://purl.org/atom/ns#')
text_plain_content_nodes = ctxt.xpathEval(
'/atom:entry/atom:content[@type="text/plain" or not(@type)]'
)
all_content_nodes = ctxt.xpathEval('/atom:entry/atom:content')
content = ""
if len(text_plain_content_nodes) > 0:
content = text_plain_content_nodes[0].content
if len(text_plain_content_nodes) > 0 or len(all_content_nodes) == 0:
writeWordDef(wikiword, content)
append_editlog(wikiword, os.environ.get('REMOTE_ADDR', ''))
else:
# If there are 'content' elements but of some unknown type
ret = report_status(415, "Unsupported Media Type",
"This wiki only supports plain texti")
return ret
The detail to notice in the implementation is the XPath used to pick
out the content element. Content elements may have a 'type'
attribute, but if it is not present then it defaults to 'text/plain'.
Since 'text/plain' is the only type of content we can support in a wiki,
it's the only type of content we'll look for.
That takes care of the EntryURI; we just have the PostURI and FeedURI to go.
|
The PostURI is used for creating new WikiWord entries.
def create_atom_entry(body):
wikiword = extractWikiWord(body)
if wikiword:
if wikiwordExists(wikiword):
ret = report_status(409, "Conflict",
"An entry with that name already exists.")
else:
ret = put_atom_entry(wikiword, body)
if (ret[0] == 200):
ret = (201, CREATED_RESP %
{'base_uri': base_uri,
'atom_base_uri': atom_base_uri,
'wikiword': wikiword
})
else:
ret = report_status(409, "Conflict",
"Not enough information to form a wiki word.")
return ret
The function 'extractWikiWord' pulls out the contents of the title
element and converts it into a WikiWord. If we have a good WikiWord
and it doesn't already exist, then we use 'put_atom_entry' to create
it. Otherwise we respond with an HTTP status code of 409 to indicate
that we won't let a POST overwrite an already existing WikiWord.
The FeedURI is the last piece we need to implement. The FeedURI is used by clients to locate the PostURI for creating new entries and the EditURIs for editing each entry. The format of the FeedURI is exactly that of an Atom feed. This is different from the Atom we use with the PostURI and the EditURI, which is just the 'entry' element from Atom. Since the format of the FeedURI is the same as that for a regular feed, you might be tempted to have the same feed for both aggregation and editing. This might work in the case of wiki but not for a general site. The reason is that you may have entries in draft or unpublished form which must appear at the FeedURI so you can edit them, but must not appear in your aggregation feed. Given that this is for a publicly editable wiki, we don't have such a constraint so we can use this feed for both purposes.
The FeedURI is implemented as a separate
script, atomfeed.cgi, that builds a feed. The code,
which is bit too long to include here, builds an Atom feed by
sorting all the files that contain WikiWord definitions in reverse
chronological order, then takes the WikiWord and associated
content, and formats it in an Atom entry. The entries are
concatenated together and placed in an Atom feed. The only special
additions are the link elements that contain the
PostURI and the EditURIs, which are denoted with attributes
rel="service.post" and rel="service.edit" respectively. Here is a
snippet from the Atom feed produced
by atomfeed.cgi.
<?xml version="1.0" encoding="utf-8"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#">
<title>PikiPiki</title>
<link rel="alternate" type="text/html"
href="http:/.bitworking.org.cgi"/>
<link rel="service.post" type="application/atom+xml"
href="http:/.bitworking.org/atom.cgi"/>
<link rel="next" type="application/atom+xml"
href="http:/.bitworking.org/atomfeed.cgi/10"/>
<modified>2004-03-09T21:32:58-05:00</modified>
<author>
<name>Joe Gregorio</name>
<url>http://bitworking.org/</url>
</author>
<entry>
<title>JustTesting</title>
<link rel="service.edit" type="application/atom+xml"
href="http:/.bitworking.org/atom.cgi/JustTesting" />
<link rel="alternate" type="text/html"
href="http:/.bitworking.org.cgi/JustTesting" />
<id>tag:piki.bitworking.org,2004:JustTesting</id>
<issued>2004-03-09T21:32:58-05:00</issued>
<modified>2004-03-09T21:32:58-05:00</modified>
<content type="text/plain">
This is content posted from an AtomAPI client.
</content>
</entry>
<entry>
<title>PikiSandBox</title>
<link rel="service.edit" type="application/atom+xml"
href="http:/.bitworking.org/atom.cgi/PikiSandBox" />
<link rel="alternate" type="text/html"
href="http:/.bitworking.org.cgi/PikiSandBox" />
<id>tag:piki.bitworking.org,2004:PikiSandBox</id>
<issued>2004-03-04T21:49:03-05:00</issued>
<modified>2004-03-04T21:49:03-05:00</modified>
<content type="text/plain">
'''I dare you''': press the Edit button and add
something to this page.
-- MartinPool
</content>
</entry>
This feed also contains one more link element of a type we haven't
talked about yet. The second link, the one
with rel="next", points to the next set of
entries. That is, when we produce a FeedURI you don't want to
put all the entries into a single feed. That
could end up being hundreds if not thousands of entries which
would be impractical to handle. Instead put in a fixed number,
like 20, and then the 'next' link points to another feed, with the
next 20 entries. If a feed is in the middle of such a chain then
it also contains a link with rel="prev" which points
to the set of entries previous to the current one. In this way
clients can navigate around the list of entries in manageable
sized sets. It should be noted here that the client code that
comes with this implementation does not implement traversing
'next' and 'prev' links in a feed.
An AtomAPI enabled wiki wouldn't be worth much if there wasn't a client available, so I've included a wxPython client that allows you to create new entries on the wiki and to edit old entries.
Remember how careful we were when specifying and using the character encoding? There isn't much code involved in supporting and processing everything in UTF-8, but careful planning ahead pays dividends. Here is a screenshot of the client editing one of the pages on a wiki with some unicode characters in it:

All of the source for both the client and the server can be downloaded from the EditableWebWiki, which is running the code described above. Note that the client is a GUI application written in Python. You must use the version of wxPython that is compiled with Unicode support. Lastly, for your platform you'll have to ensure that you have fonts available to display the Unicode characters you are going to be using.
One of the reasons we started using the AtomAPI on a wiki was to
stretch the API and see where things broke down. Nothing really
awful showed up, though we did find some rough spots. The first
rough spot cropped up when doing a GET on the EditURI, where we
encounter a slight mismatch between the formulation of the AtomAPI
and this wiki implementation. The problem is that
according to version 9 the draft AtomAPI, when doing a GET on
an EditURI, the issued element
is required. Since PikiPiki only stores the raw contents
in a file, and doesn't store any other data, we are limited to
using the last modified date stored in the file system for each
file, which isn't the same as the issued element.
The second rough spot is in the area of content. The only only type of content we accept is 'text/plain', but that isn't the only type of content that a client could post. In fact, most may be able to produce 'text/html' and some may even be able to produce 'application/xhtml+xml'. Now we may be able to add code to this implementation to convert HTML into WikiML, but the broader question still stands: how does a client know what kinds of content, i.e. which mime-types, an AtomAPI server will accept? This is an open question as of today.
Using Python and the XPath facilities of libxml2, it was
straightforward to build an AtomAPI implementation for a
wiki. There isn't even very much code: atom.cgi is
just 146 lines of code, while atomfeed.cgi is just
122 lines.
This is just a basic client that does the minimum to support the AtomAPI. In a future article the way the server handles HTTP can be enhanced to provide significant performance boosts by using the full capabilities of HTTP. In addition, the SOAP enabling of the server will require some changes. After that we can add the ability to edit the wiki's templates.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.