Interacting with Resources: Web Architecture Review

January 21, 2004

I had hoped to examine the W3C TAG's Architectural Principles of the World Wide Web (AWWW) within the space of three installments of the XML-Deviant column. That plan turned out to be overly ambitious, since there are some outstanding issues and nontrivial puzzles remaining; or perhaps I simply don't write as concisely as I hoped?

Thus far I've discussed the three chief principles of the Web's architecture: identification, interaction and representation. I then spent a few columns discussing identification, which is the source of everyone's favorite permathreads about URIs and resources. It's a cluster of interesting issues, but certainly not to everyone's taste or inclination.

In this and the next column I will discuss the second fundamental web principle, namely interaction. Given that we have a means of identifying and conceptualizing web resources as nodes within a distributed, hypermedia information system, we need some means of interacting with them. The nodes of an information system do us no good if we or agents acting on our behalf cannot interact with them.

Act, Interact, Interaction

While HTTP GET is the best known and most commonly used means of interacting with web resources, it is by no means the only way. One may use PUT to mutate the state of some resource, POST to create a new resource (or, more specifically, to add a new resource to an existing collection of web resources), DELETE to destroy an extant resource, and so on. In each of these cases what is happening, among other things, is that a user agent is "dereferencing the URI" -- this is a basic term of art in this area -- which identifies the resource in question.

In the XML and web development communities we often talk casually about dereferencing URIs. We talk as if dereferencing a URI is an atomic operation, and for most purposes that's precisely what it is. But, as the AWWW points out, what a user agent must do to dereference a URI is both specific to a URI scheme and often rather complex. It is a testament to the engineering practices of several, historically extended communities that dereferencing a URI may "involve a succession of steps as described in independent specifications", and that for the most part it just works. I think this simple fact, which hides worlds of social complexity, is worth bearing in mind.

I won't walk through the AWWW's eight-step examination of a URI dereference, nor will I list the six or so relevant specifications that touch on this process. The important point is that it works, and that it is both less simple and more robust than it often seems.

As the REST architectural style suggests, user agents on the Web don't really interact directly with resources. Rather they interact directly with representations of the states of resources, which are identified by URIs. Given the utility of the opacity of web identifiers it stands to reason that the protocol by means of which interactions between user agents and servers is structured includes provisions for transmitting metadata about the format of the retrieved representation. User agents send messages to servers, requesting, say, the retrieval of a representation. In response, servers pass messages back to the user agent in which the retrieved representation, together with metadata, may also be included. The protocols determining the Web's primary interactions are highly structured and rather complex.

The AWWW conceptually decomposes a resource representation into two elementary parts: representation data and representation metadata. The representation data is the re-presentation of the state of some resource which the user agent requested. It is ideally structured according to some format, preferably one which is a standard. HTML and XML and RDF are three such standard representational formats.

Given the opacity of URIs, how is a user agent to know what format the retrieved representation of the state of the requested resource is in? The role of representation metadata is to inform user agents, primarily by means of Internet Media Types (what we used to call MIME types, remember?) what kind of representation it is receiving. Well, that's a bit too quick, isn't it? In fact, an Internet Media Type provides a way for user agents to determine which actual data format it has retrieved -- and, thus, which is the authoritative way to handle that format and hence the representation. "The IANA registry," the AWWW points out, "maps media types to data formats."

Got My Mojo Working

This brings us back to the puzzling issue from the previous column: an earlier draft of the AWWW said that, essentially, fragment identifiers are identifiers of parts of a resource-state representation. In other words, a URI identifies a resource and the fragment identifier part of that URI identifies part of the representation of the resource. So, for example, http://monkeyfist.com/kendall/xfmllib#download has (under one description) two parts: the fragment identifier and the rest of the URI. The URI identifies a resource which we might call "Kendall's page for the Python XFML library, xfmllib", and if you dereference that URI without the fragment identifier, you get an HTML representation of that resource. If you dereference it in a typical browser, you get that HTML representation, but your browser tries to find a named part of the representation called "#download". That URI identifies a URI and, then, a part of the representation of that resource.

That is a rather clear, perhaps even elegant way of understanding fragment identifier semantics. It basically says that the interpretation of the fragment identifier is specific to the Internet Media Type of the retrieved representation which one gets when one dereferences the URI. Cool.

But now in the CR draft of the AWWW this elegant clear language about resources, representations, and fragment identifiers is nowhere to be found. Instead we get language about primary and secondary resources, which I find neither elegant nor particularly clear. "Given a URI 'U#F', and a representation retrieved by dereferencing URI 'U'", the AWWW says, "the (secondary) [which links to this text in the AWWW: 'The fragment identifier of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional information' -- ed.] resource identified by 'U#F' is determined by interpreting 'F' according to the specification associated with the Internet Media Type of the representation."

First, I don't know what it means to "determine" a resource: secondary, primary or otherwise. I know what it means to dereference a URI, which is how I take it that one interacts with resources on the Web. I also know how to conceptualize the idea of a resource identifier pointing to or identifying some part or element of its retrieved representation, in a representation-relative way. Second, I find it deeply broken that one interacts with some kinds of web resources by dereferencing the URIs which identify them, while one interacts with other kinds of resources by doing something representation-specific in the context of the representation of the state of some other resource (assuming that "determine" is just a sloppy way of talking about interaction, though I'm just unsure that it is).

As I said last week, under some description ("does the Web work?") this just doesn't matter. But under the more relevant description in the context of the AWWW, namely, "do we have clear, crisp, conceptually rigorous explanations of how and why the Web works?", this talk of primary and secondary resources, and resources as parts of the representations of other resources, is otiose. It implies, among other things, that the conceptual and practical distinction between a resource and a representation is not very distinct at all. A resource is independent of the representations of its state which we interact with via the Web. That's a fundamental architectural principle. It makes, for example, content negotiation possible. I can ask for the resource identified by a URI and my user agent and the server can negotiate, according to our shared expectations and preferences, about which representational formats my user agent will receive. Tricky to implement? Yes. But very cool stuff.

Obscuring the distinction between resources and representations simply muddies waters once clear. As I understand both REST and the rest of the AWWW, one interacts with resources by dereferencing URIs and doing things with the retrieved representations. Now we're asked to countenance the idea that sometimes one interacts with a resource by dereferencing the URI of another resource and then doing some representation-specific thing to the representation retrieved by the first reference. Eek.

We have to be careful about one distinction that shouldn't be obscured. The Web is a web because we find pointers or links to resources within the representations of many web resources. That's a perfectly sane, reasonable idea. Yet it's one very different from saying that there are gremlinesque resources which hide in the representations of other resources, with which one can interact by doing format-specific things to the representations of some other resource.

But it Just Don't Work on You

One of the explicit consequences of this primary-secondary distinction is the idea, which the AWWW claims to have come about "by design", is that given U#F and #F, the so-called "secondary resource", "is expected to be the same across all representations. Thus, if a fragment has defined semantics in any one representation, the fragment is identified for all of them, even though a particular data format may not be able to represent it." What fresh, warty hell is this?

Let's consider some cases. Imagine that, given robust content negotiation, one could retrieve representations of this resource, http://monkeyfist.com/kendall/xfmllib#download, in any of the following formats: XML, XHTML (1.1), HTML (4.0), RDF, and SVG. Now according to the AWWW, if the fragment "#download" has "defined semantics in any one representation", say, HTML, then for all the other representations, that fragment means what it means in HTML. What on earth does that mean? I'm not sure that's even a coherent claim.

It gets worse. What if, as is the case here, many or all of the representational formats offer "defined semantics" for fragment identifiers? (Here I'm assuming that "defined semantics" means "the standard with which the Internet Media Type is associated establishes a representation-specific way of interpreting fragment identifiers".)

The AWWW's example is a URI identifying a resource available in three different data formats: SVG, JPG, and PNG. Since only SVG defines a meaning for fragment identifiers, the issue is simply resolved: the fragment identifier only means anything at all in the context of the retrieved SVG representation. This is the sane way to put the issue, rather than saying that there is only a secondary resource at all when the representation of the primary resource is SVG.

Let's assume however, that these are mutually incompatible representational semantics. The question arises then, which of the several, mutually incompatible secondary resources is identified by a primary resource with a fragment identifier? In other words, the answer to the claim that "the secondary resource identified by a URI with a fragment identifier is expected to be the same across all representations" is yes, but which one? An RDF #download means something very different than SVG #download. Even assuming that it makes sense to say that one resource is the real one that all the others really identify, one has to ask which one that is. Do we have any principled, algorithmic or even heuristic way of identifying it?

The AWWW, in perhaps one of the most underloaded sentences in W3C specification history, says: "On the other hand, it is considered an error if the semantics of the fragment identifiers used in two representations of a secondary resource are inconsistent". What kind of error is this? Does this mean that for the same resource I can't provide human readable and machine readable representations, with, I assume, "inconsistent" semantics, and use fragment identifiers for both?

An even more crucial question is what kind of semantic inconsistency is being touted here? Are the semantics of, say, HTML and RDF really inconsistent? HTML does not have a semantic interpretation in the same sense as that of RDF. There can be no direct inconsistency in the sense of logical contradiction where there is no common ground.

Also in XML-Deviant

The More Things Change

I find this entire discussion of multiple fragment identifiers inconclusive and misleading, especially without some further fleshing out of what it means for representational formats to have inconsistent semantics. What kinds of web systems should I avoid deploying in order to be consistent with the good practice, "Fragment Identifier Consistency"? The AWWW states "A resource owner who creates a URI with a fragment identifier and who uses content negotiation to serve multiple representations of the identified resource SHOULD NOT serve representations with inconsistent fragment identifier semantics". So SVG, JPG, and PNG don't have inconsistent fragment identifier semantics, since only the former has a defined meaning for fragment identifiers, but RDF and HTML, since they both have fragment identifier semantics, are inconsistent?

If that's the idea, I don't think the AWWW really wants to talk about consistency and inconsistency, which strongly implies that two fragment identifier semantics could be different but either consistent or inconsistent. At the very least, I could use a robust set of examples or even a matrix of representational formats that it's not safe to mix. My pharmacist warns me not to mix Drug A with Drug B, or Drug C with alcohol or the operation of heavy machinery. I'd like more, specific guidance from the AWWW as to unsafe representational mixtures. Accidental overdoses are a real buzz kill, after all.

That's enough for this week; next week I'll finish up the discussion of interaction by considering "Authoritative Representation Metadata", "Safe Interactions", and "Representation Management".