TAG and the Web's Architecture

September 4, 2002

Kendall Grant Clark


Members of the W3C's Technical Architecture Group (TAG) -- and, indirectly, the community of developers which clusters around it -- must frequently find themselves in conceptually sticky situations. Part of their task is to articulate the architectural principles of the Web, which is already the most successful distributed information system ever built. But things aren't always as simple as they appear. On the one hand, given that the object of their inquiry is clearly successful, how hard can it be to discover and articulate the principles responsible for that success? Isn't it just a matter of looking in the right places at the right times and then writing down what one sees? On the other hand, the Web, however successful, is not immune to human error, is full of dark corner cases, and is perpetually in transit.

The TAG's job seems, then, by turns easy and impossible. This situation reflects, I take it, the theoretical maxim that the descriptive and the normative are conceptually implicated with one another. At the level of practice things tend to be less complicated if there are clear lines of demarcation between describing and norming. But that's precious little help when part of your job is to discover and articulate or, as often, to draw for the first time these lines of demarcation. The lines demarcating what one may or may not do in building, say, a Web application and what one must or must not do just are the Web's architectural principles. The TAG's task presupposes that a constituent of a system works best when its principles of operation are consonant with or identical to the principles of the system itself.

Thus far in the TAG's life, public attention has focused primarily on its ad hoc pronouncements -- "ad hoc" because the pronouncements are often the result of some query about a specific issue. Along the way TAG members have been drafting a document, "Architectural Principles of the World Wide Web" (APW), which will serve as a definitive statement of what they've discovered and defined about what makes the Web work. The first public Working Draft of this document was released recently; it offers, even while still in its preliminary stages, an important angle of sight into how well the TAG has negotiated its sticky situations.

Sections 1 and 2 of the APW are the most fully developed so far. In what remains of this column I review the APW's Section 1; in next week's column I will consider Section 2. In future columns, I will consider Sections 3 and 4, after subsequent APW drafts have filled them out more completely.

The Web's Architecture

Let us dispense with inevitabilities first: as a first public draft, there are sentences and words missing, infelicities of expression, and various rough spots where a good deal of sentence and phrase-level work remain to be done.

Setting those problems aside, as is only fair, the shape of the document itself is the first thing to note. The APW contains four substantive sections: an introduction, a section on identifiers and resources, a section on formats, and a section on protocols. The structure of the document reflects the structure of the Web's architecture, which the APW says consists of identifiers, formats, and protocols.

But how does the APW understand the Web itself? As it says, the Web is "a networked information system consisting of agents (programs acting on behalf of another person, entity, or process) that exchange information". The three constitutive parts of the Web's architecture are already implied by this definition.

First, in a networked system, there must be a means of addressing and naming the nodes or objects comprising the network. By "identifiers" the APW means the Web's single means of addressing the objects which constitute it, the URI, as established in RFC 2396.

Second, in a system in which information is exchanged between agents, that information must take a concrete form, that is, the nodes of the network must share a means of encoding and re-presenting the information which they exchange. By "formats", then, the APW means the diverse group of information exchange standards, including HTML, CSS, RDF, and, presumably, lower level standards like Unicode. But successful information exchange systems need the flexibility to adapt to change, both internally and externally. The ideal, then, is for the means of information representation to be extensible in an orderly, predictable fashion. The APW includes, therefore, XML and namespaces as part of the "formats" of the Web.

Third, in a networked system of information exchange, the methods and means of exchange must be well-defined, widely shared by the relevant agents, must be consistent with the system's scheme for addressing and naming objects, and must be capable of carrying the encoded information. By "protocols" the APW means, then, a set of formal standards for information exchange, "including HTTP, SMTP, and others", as the APW puts it.

A Preliminary Statement of Principles

So, to review: the Web is a networked system of information exchanging agents, comprised of three parts: identifiers, formats, and protocol. What architectural principles -- which "critical design choices", as the APW puts it -- follow from this understanding of the Web's structure?

The TAG has derived so far nine principles and two "good practices" from its understanding of the Web's nature and constitutive parts. I have time and space here to quote the principles and practices only; further discussion of them will occupy next week's XML-Deviant column.

  1. Use absolute URI references
  2. Absolute URI references are unambiguous
  3. Describe resources
  4. Representation retrieval is safe
  5. Be aware of context-sensitivity in absolute URI references
  6. Use consistent representations
  7. Support persistence
  8. Avoid unnecessary new URI schemes
  9. Do not use unregistered URI schemes

To my, perhaps not overly architectural, way of thinking, the majority of these are better off advertised as practices rather than principles, including numbers 1, 3, 5, 6, 7, 8, 9, all of which specify what one should or should not, must or must not do, presumably when one builds Web applications or in one's attempts to extend some part of the Web itself. Only numbers 2 and 4 seem to be statements of principle unambiguously. Number 9 is something of an in-between case; if one reads it to mean that unregistered URI schemes are simply not part of the Web, by definition, that strikes me as a statement of principle rather than a practice. And, for what it's worth, it seems to me that however unworkable and ill-fitting unregistered URI schemes are "on the public Internet", as the APW puts it, the APW's business should be the "public Web". But presumably this is simply an infelicity of expression.

As for the explicitly stated good practices, there are two:

  1. Do not rely on URI case insensitivity
  2. Be aware of content negotiation and fragment semantics

One might be tempted to think the difference between principles and good practices, insofar as the APW is concerned, is just the difference between things which one should do or not do and things which one must do or not do, but they simply don't break down in that way at all. How the TAG distinguishes between the suggested good practice of being aware of "content negotiation and fragment semantics" and the Web architectural principle of, say, being aware of "context-sensitivity in absolute URI references" one can only expect will be addressed and clarified in future drafts.

Thus, how we are to understand the difference, if there is any, between good and "best" practices, and what distinguishes a principle from a good practice, are questions the APW hasn't yet seen fit to answer. It's noteworthy that the nine principles and two good practices are all related to Section 2, which discusses Identifiers. In fact, Section 2 is basically an in-depth discussion of these eleven principles and practices. This suggests that as the TAG releases further drafts of the APW, we can expect to see more principles and practices, which are derived from the TAG's further, ongoing work on the formats and protocols of the Web. By a bit of simple-minded extrapolation, then, we might expect as many as 30 principles in the APW, at which point it may no longer make much sense to think of them in terms of first principles. But that's speculation the idleness of which far exceeds its utility.


As will be made clear in next week's column, I think the substance of the principles and practices so far outlined in Architectural Principles of the World Wide Web is largely spot-on. Suffice to say for now that the APW is a very significant document, both in the history of the W3C and in the future evolution of the Web itself. In matters of substance, especially at such an early point in its development, it gets more right than it gets wrong. While I look forward to some clarification of the differences between principle and practice, this point ought not to distract Web developers from a serious consideration of the APW, both in its present and future forms.