Constructing or Traversing URIs?

April 6, 2005

Here is a quick review of where we ended in the last column. We answered all four questions about our resources and their representations. All of that work is summarized in the table below.

*Resources in our Bookmark Service*
Resource	Method	Representation	Description
Bookmark
	`GET`	Extended XBEL Document for a single bookmark	Get a bookmark.
	`PUT`	Extended XBEL Document for a single bookmark	Update a bookmark.
	`DELETE`	n/a	Delete a bookmark.
Bookmark Collection
	`GET`	Extended XBEL Document for a bookmark collection	Get a collection of bookmarks.
	`POST`	Extended XBEL Document for a single bookmark	Add a bookmark to a collection.
Keyword Lists
	`GET`	Keyword List Document	Get a list of keywords.

The URIs that we picked for our service are outlined in the next table.

*URIs in the Bookmark Service*
URI	Type of Resource	Description
[user]/bookmark/[id]/	Bookmark	A single bookmark for 'user'.
[user]/bookmarks/	Bookmark Collection	The 20 most recent bookmarks for 'user'.
[user]/bookmarks/all/	Bookmark Collection	All the bookmarks for 'user'.
[user]/bookmarks/tags/[tag]	Bookmark Collection	The 20 most recent bookmarks for 'user' that were filed in the category 'tag'.
[user]/bookmarks/date/[Y]/[M]/	Bookmark Collection	All the bookmarks for 'user' that were created in a certain year [Y] or month [M].
[user]/config/	Keyword List	A list of all the 'tags' a user has ever used.

But we aren't quite done. Having designed the URI space for our resources, we now need to consider how to make those URIs discoverable. One of the easiest ways would be to publish the above table. That is, I could publish a description of the interface as a recipe that included directions on how to form URIs for all the parts of the service. But that has several drawbacks. First, it limits our protocol to just working on one site. If someone else wanted to adopt our protocol they would have to cut and paste our protocol description and then fiddle with the URIs to match their system. That's not terribly efficient.

We have all these resources in our system, yet how do we enable the URIs of those resources to be discovered? Part of our specification, and of our running system, is being able to navigate around those resources. There are two types of solutions available to us; URI Construction and Hypertext Navigation. Let's look at both of them carefully to learn about their advantages and disadvantages. I'll initially set this up as a dichotomy, an us-versus-them decision; but in the wrap-up we'll really see that they are just extremes on the spectrum of solutions.

In This Corner: Hypertext Navigation

To do this we'll start by looking at a rather widely-deployed system that supports both types of navigation: namely, the Web. Yes, plain old HTML over HTTP supports both kinds of navigation.

The first navigation type we'll cover is Hypertext Navigation. That's really just a fancy way of saying "following a trail of links from document to document". You do this all the time when surfing the Web, and the Web in turn serves as an excellent example of such a system of navigation. The core of that navigation is the anchor <a/> element of HTML. The anchor element provides a link that, when activated by the user in their client (usually a browser), will retrieve a representation of the resource. For example, here is a fragment of an HTML document:

For more information about W3C, please consult the 

<A href="http://www.w3.org/">W3C Web site</A>.

Note that the client just dereferences the URI given in the "href" attribute as is, without modification.

In The Opposing Corner: URI Construction

URI Construction is also easy to understand. The basic idea is that I give you a recipe for creating a URI to identify the resources you desire. This is the way the Del.icio.us API works. For example, to get a list of the most recent posts, you construct a URI using the following recipe:

function: http://del.icio.us/api/posts/recent?

   &tag= filter by this tag - optional

   &count= number of items to retrieve - 

          optional (defaults to 15, maximum 100)

returns a list of most recent posts, possibly 

filtered by tag, maxes out at 100.

Come Out Swinging

Now let's look at the advantages and disadvantages of both URI Construction and Simple Hyperlinks.

While URI Construction can be simple to explain, it does lock you into a fixed URI space. That may cause problems in the future. For example, Del.icio.us could get bought out tomorrow by Yahoo!, and if they did they would be faced with a tough choice; either keep the Del.icio.us domain name up and running forever, or update the Del.icio.us API to start at http://delicious.yahoo.com/api, thus forcing all the producers of clients that use the Del.icio.us API to update their clients to use the new address. Neither of these options is particularly appealing.

Hypertext Navigation is simple to implement. Just pull the URIs out of your representation and dereference them. It is also easy to expand. Got more services? Just add more hyperlinks. The down side is that just adding more links is not always practical. As an example of how a URI space that is too large is impractical to navigate using Hypertext, consider the interface with Google if it didn't do searches by query, but instead by clicking links on a Web page, one link for each letter of the alphabet:

Choose the first letter of the word you are searching for:

A B ... Z.

You get the idea. That would be tedious and impractical.

So, both methods have advantages and disadvantages.

Two Great Tastes That Taste Great Together

When we started down this road, I said I was going to paint this as a dichotomy to aid in exposition, and now we'll tear that down and show that we're really talking about a spectrum of solutions. To do that, let's look at another example, one that combines both URI Construction and Hypertext Navigation.

Forms in HTML, at least those forms that use GET, use hypertext and also construct URIs. Consider the following form.

<FORM ACTION="http://example.com/some_script.cgi" METHOD="GET">

  <P>

    <INPUT TYPE=TEXT NAME="search_value" SIZE=40> 

    <INPUT TYPE=SUBMIT VALUE=" Search ">

  </P>

</FORM>

The above code creates a form that looks like this:

If we were to load the above form in a Web page and click on the "Search" button, the browser would construct the following URI, which it would then try to dereference:

http://example.com/some_script.cgi?search_value=This+is+a+test

The browser constructs the URI using information in the form and a set of rules from the HTML specification to combine them into a URI. This is URI construction in the sense that the query parameters for the URI are built on the fly from the data in the form. It is also hypertext navigation since the URI to which the query parameters are added is found in the hypertext representation, in this case in the form of the ACTION attribute. In building the target URI, information is used from two distinct places; the hypertext document and also the rules inherent to the hypertext format. Inherent in this case means I'm just gonna give you a link to the specification and you can read about them there.

So we have two sources of information.

The Hypertext
The Recipe

Let's go back and look at our three examples and see how they break down in terms of pulling information form these two sources.

Simple HTML Linking: This is just plain and simple Hypertext. You get a link in a document and dereference it. The only source of information is the Hypertext.
The del.icio.us API: The API is a recipe for how to construct URIs, pure and simple. There is no hypertext involved in this system. You construct your URIs directly from the Recipe.
HTML GET based forms: This example uses information from both hypertext and a recipe. The Recipe determines how to combine the information in the hypertext to produce a URI.

Also, you must realize that the HTML forms do more than just allow URI construction. They can also create POST requests, so don't get hung up thinking "navigation" means a series of GETs only; dereference doesn't mean just GET. In addition, XForms, from the W3C, add the method PUT as an available method that forms can use. The work of the WHATWG on Web Forms 2.0, which is at a third call for comments as of this writing, opens up the field to any method.

"Hypermedia as the engine of application state."

We didn't get here by mistake.

"REST is defined by four interface constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state." [Roy Fielding]

As you can see, when exposing our service to the world we have a range of methods we can use, from URI Construction to Hypertext Navigation, with "hypermedia" like HTML forms filling in the range between those two extremes. Note that while HTML only does URI construction on the query parameters, we don't need to constrain ourselves to just that part of the URI. We could come up with our own URI construction, one that included constructing not just the query parameters but also the path portion of the URI. Let's do that.

Exposing Our Service

Our site will publish a single file that lists all the top-level resources in the protocol we support, along with a description of how to construct the URI for each resource.

To reiterate, we'll have two sources of information, the Hypertext and the Recipe.

First let's look at the hypertext:

<bookmark-service xmlns="http://example.com/documentation/service" >

  <recent>http://example.com/{user}/bookmarks/"</recent>

  <all>http://example.com/{user}/bookmarks/all"</all>

  <by-tag>http://example.com/{user}/bookmarks/tags/{tag}/"</by-tag>

  <by-date>http://example.com/{user}/bookmarks/date/{Y}/{M}/"</by-date>

  <tags>http://example.com/{user}/config/"</tags>

</bookmark-service>

We have a "bookmark service" element with one child for each part of the protocol we want to expose. Each child element has a value that is a template for the URI to each of the resources.

And now the Recipe:

The value of each child of "bookmark service" is a template of the URI for the resource.
The variables in the template are enclosed in unescaped braces '{' and '}'.
The character encoding of the template URI, and of the URI after the template is filled in, MUST be utf-8.
With the exception of the variable names and their enclosing braces, the template URI will be properly escaped.
Substitute the brace-delimited variable for the desired value of that variable, ensuring that the value is also properly percent-encoded utf-8.

So, following the Recipe and using the above bookmark service document, if we wanted the URI of the Keyword List resource for the user "fred", we get the template URI from the "tags" element and substitute {user} with "fred" to get:

  http://example.com/fred/config/

Note that method of exposing our service has some very useful properties:

If a template has no variables then it is a valid URI.
A URI that contained '{' or '}' anywhere else in the URI besides the variables would be percent encoded and would not match a variable in a template.

Here is some sample Python code to implement the above recipe.

  import re

  import urllib



  def expand_uri_template(uri, params):

    # Add braces around each of the keys and %-expand all the values

    quoted_params = dict([("{%s}" % key, urllib.quote(value.encode('utf-8'))) 

        for key, value in params.iteritems()])



    # define a function that we will later use to do the

    # regular expression substitution. Replace matches

    # only if they have valus in quoted_uri_parameters.

    def replace(match):

        return quoted_params.get(match.group(0), match.group(0))



    return re.sub(r"{.*?}", replace, uri)

Let's take a look at how to use the 'expand_uri_template' function.

  # -*- coding: utf-8 -*-

  uri_parameters = {

    u'user': u'jcgregorio',

    u'tag': u'Iñtërnâtiônàlizætiøn'

  }



  print expand_uri_template("http://example.com/{user}/{tag}", 

       uri_parameters)

This has the following output:


http://example.com/jcgregorio/I%C3%B1t%C3%ABrn%C3%A2ti%C3%

     B4n%C3%A0liz%C3%A6ti%C3%B8n

What have we accomplished, besides making the users of our protocol do one more GET? We have decoupled our URI structure from the protocol. This has several advantages:

Changeable URIs. We can change our URI structure willy-nilly. Yes, Cool URIs Don't Change, but if we get bought out we won't have to keep our domain up and running forever just to support old protocol clients.
Reuse. This allows other sites to reuse our protocol without having to change the documentation or reproduce our URI structure identically.
Extensibility. This gives us a nice point for announcing the availability of new services. For example, if we add a new resource that is the top 20 most popular tags, we can document that resource in the protocol documentation and add a new element to the Bookmark Service Document. The absence of that element indicates that services doesn't support that part of the protocol.
Linkage. We can also use the link element of HTML to point from a Web page to our introspection file. A fine example of Hypertext Navigation there, and it's good because a Web service really should be linked into, you know, the Web.

Note that we need to update the documentation for our Web service; we've added a new resource, the bookmark service resource, and the documentation for each of the resources in our protocol now needs to be associated with its element in the bookmark service document. One of the inspirations for a single resource that enumerates all the facets of a Web service came from the XML-RPC Introspection Service. I'll keep that nomenclature and refer to this as the Introspection resource.

Here is the updated list of resource types in our service, after adding in the Introspection Resource.

*Updated list of Resources in our service*
Resource	Method	Representation	Description
Bookmark
	`GET`	Extended XBEL Document for a single bookmark	Get a bookmark.
	`PUT`	Extended XBEL Document for a single bookmark	Update a bookmark.
	`DELETE`	n/a	Delete a bookmark.
Bookmark Collection
	`GET`	Extended XBEL Document for a bookmark collection	Get a collection of bookmarks.
	`POST`	Extended XBEL Document for a single bookmark	Add a bookmark to a collection.
Keyword Lists
	`GET`	Keyword List Document	Get a list of keywords.
Introspection
	`GET`	Introspection Document	List of all the important resources in our service.

And here is a description of all the parts of our service. Previously we listed each of these by their URI, but now they are listed by the element in the Introspection Document. Each one represents a high-level resource in the Web service. They don't have to represent all of the resources available in our Web service; for example, there isn't an element in the Introspection Document to get us the URI of a single Bookmark resource. That's because those URIs will be listed inside Bookmark Collections, which will be listed in the Introspection Document. I'll use the term "facets" for those high-level resources that get listed in the Introspection Document.

*Facets of the Bookmark Service*
Introspection Document Element	Template Parameters	Type of Resource	Description
recent	user	Bookmark Collection	The 20 most recent bookmarks for "user".
all	user	Bookmark Collection	All the bookmarks for "user".
by-tag	user tag	Bookmark Collection	The 20 most recent bookmarks for "user" that were filed in the category "tag".
by-date	user Y M	Bookmark Collection	All the bookmarks for "user" that were created in a certain year [Y] or month [M].
tags	user	Keyword List	A list of all the "tags" a user has ever used.

Summary

We now have a good first pass at a description of our bookmark Web service. In addition, the Introspection Document gives us a flexible platform upon which we can add new facets in a consistent manner. Next time we'll dig into implementing our service.