Composition

July 20, 2005

"If you use inheritance where composition will work, your designs will become needlessly complicated." —Bruce Eckel, Thinking in Java

This week's column takes a look at two new specifications that are winding through the W3C Recommendation Track, xml:id as a Proposed Recommendation and XLink 1.1 as a Last Call Working Draft. These two specifications share an important common trait: neither is a standalone vocabulary, but rather they are intended to be combined into larger vocabularies. The formal name for "combining stuff" this way is "composition," this week's topic. First, a thirty-second review of some basic terms for any nonprogrammers.

Code written by someone who has just learned object-oriented techniques is usually pretty easy to spot. One of the main tells is enthusiastic overuse of derivation, that is writing new code (called a "class") explicitly based on existing code, modulo specific changes. Derivation is an important tool for expressing clear is-a relationships. It would make sense to derive a new class, say Circle or Polygon, from a more fundamental Shape class, because a polygon or circle really is a kind of a shape. More to the point, a sign that derivation makes sense is when circles and polygons can be treated more generally as shapes, for example the code might say, "Attention all shapes: render yourselves to SVG." Other cases, however, are not so clear cut. For example, is an XMLDom class really a derivation of String? Probably not. In designing software, many such situations arise; experts such as Bruce Eckel recommend sticking with composition in those cases.

In the realm of standards, both derivation and composition take place. Derivation might occur when one specification forms an extended subset of another—almost always a sign that something has gone horribly wrong. At a basic level, though, composition happens all the time in the normal course of producing specifications. A look at the XML 1.1 specification shows eight "normative" references, including the Unicode specification, which forms the basis for much of XML's interoperability.

Designing an XML vocabulary is a special case of producing a specification. Despite XML Namespaces finalization more than five years ago, architects are still uncovering trouble in applying composition in XML, more commonly referred to under the banner "compound documents." The first to look at this week is a revision to XLink.

XLink

XLink 1.0 became a W3C Recommendation in June 2001, amid a fair amount of controversy. Support for the standard has been slow to come; in a March 2002 article titled "XLink: Who Cares?" Bob DuCharme noted that only seven partial implementations were available. In fact, the death of XLink has become one of several permathreads on the xml-dev mailing list.

The actual changes in 1.1 are modest, and previously documented. On the xml-dev Len Bullard pondered:

After the thread a while back about the death of XLink, I keep finding examples of it being cited, particularly in the OGC [Open GIS Consortium] literature...I wonder if there is a term for this phenomenon: polite inclusion of other works that are not successful in terms of ground support

One of the most prominent examples of politely including XLink has been inside SVG (scalable vector graphics), but even there, DTD (document type definition) tricks were used to remove the need for some of the explicit markup. XLink 1.1 legitimizes the use of sole xlink:href attributes without need of any further tricks, which is, of course, a pleasant development.

Even so, the changes in version 1.1 don't address the more fundamental complaints with XLink. For example, XLink 1.1 is still incompatible with any version of XHTML, but especially XHTML 2.0, which uses two distinct attribute names for different kinds of links.

However, the remainder of the thread on xml-dev consisted largely of folks pointing out increasing usage of XLink: Geoffrey Shuetrim pointed out use in XBRL, Alexander Johannesen pointed out Topic Maps XTM, and Rick Jelliffe highlighted his company's Schema Management Tool, which uses XLink extensively under the hood.

Does this indicate a resurgence in XLink's fortunes? Bullard concludes by saying that "the usefulness of architectural forms is apparent," a sensible position.

Prediction: XLink 1.1 isn't going to do much beyond encouraging uses where it has historically been politely included.

xml:id

Proposals for something like xml:id go back a long stretch of time. Even in November of 2001, Leigh Dodds, in this very column, outlined several proposals including the one we see today substantially as a Proposed Recommendation. Tim Bray (famously) wrote:

This is in danger of tripping over what is maybe the #1 gaping architectural hole as regards XML & the Web. The problem is that at the moment, given some arbitrary XML, there is no good way to determine what's an ID without recourse to some external resource like a DTD or schema, and that, to use a technical term, sucks.

Who's to say that long-standing problems with XML never get solved? But the tricky thing about the way specifications interact under composition is that solving one problem often uncovers another. In this case, the use of the reserved xml prefix, originally justified for this architectural hole as regards XML, learned a new trick, or at least unlearned an old one.

The attributes xml:lang and xml:space, both defined as part of XML itself, as well as later addition xml:base, function over a scope that includes child elements. For example, defining any of these attributes once on the root element would have an effect on every element in the document. But xml:id, in contrast, doesn't follow this assumption: it applies to a single element. Specifications that made assumptions about scoping of xml-namespaced attributes will run into problems if documents with xml:id become common. Canonical XML (including its influence on XML Signature) is the most often cited example here.

Prediction: xml:id will rapidly find its way into vocabularies that need well-known IDs, especially without DTD processing. Other specifications that over assumed about xml-attribute scoping will quickly come into line.

Conclusion

In Java programming, composition is straightforward and almost foolproof. With specifications, however, it's still possible for all kinds of unexpected interactions to take place. Further, the number of technical specifications seems to be steadily increasing with time, so that chances of conflicts continue to rise. The xml:id example showed that sometimes even a nearly-unrelated spec can come along and throw chaos into your world by dismantling fundamental assumptions.

Prediction: more problems are yet to be uncovered in the various combinations of existing XML specifications, to say nothing of the new ones. Somehow, we'll keep muddling through.

Births, Deaths, and Marriages

New versions of nxslt

Versions 1.6 and 2.0Beta1 of the nxslt, a free .NET XSLT command-line utility from Oleg Tkachenko.

Arbortext Bought

Atom 1.0 Baked

Documents and Data

An identifier by any other name...

Rick Jelliffe on untangling the Schema spaghetti

If you're into the podcast thing, this one looks interesting.