Using the W3C XSLT Specification
The W3C's XSLT Recommendation (available at http://www.w3.org/TR/xslt) is a specification describing the XSLT language and the responsibilities of XSLT processors. If you're new to XSLT, the Recommendation can be difficult to read, especially if you're not familiar with W3C specifications in general and the XML, XPath, and Namespaces specs in particular. This month, I'd like to summarize some of the concepts and terms that are most likely to confuse an XSLT novice reading the XSLT Recommendation.
But, first, what do we mean by "Recommendation"? The W3C can't force anyone to do anything, so they call a specification that has been through their whole process of various drafts, reviews, discussions, revisions, and final approval a "Recommendation."
The first step on this path is Working Draft status. A Working Draft comes from either a submission by a W3C member company or from a Working Group formed by the W3C to work on the specification. Further work is done by a Working Group, which makes a draft public at various stages for comments from both within and outside of the W3C; the beginning of each spec describes where to send comments and where on the Web to read comments that have been sent so far. (See http://www.w3.org/TR for links to all the specs in any stage of W3C consideration.)
When the Working Group feels that the Working Draft is ready, it's submitted to the W3C Director for possible promotion to Candidate Recommendation status. Proposed Recommendation used to be the last stop before becoming an official Recommendation, but the W3C has recently added Candidate Recommendation as a new penultimate stage. This stage is now the time when application developers are encouraged to implement the spec and, if all goes well, the Candidate Recommendation becomes a Proposed Recommendation. If everyone (in the W3C, that is) is happy with it at that point, it becomes a Recommendation. XSLT 1.0 reached this point on November 16th, 1999.
W3C specs are generally specifications for software behavior aimed at programmers. They rarely include tutorials and can be tough to read whether you jump in somewhere in the middle or start from the very beginning. Like most W3C specs, the XSLT Recommendation has various confusing terms that get used often. Even more confusing are the pairs of terms that are just similar enough and just different enough to make them easy to mix up.
Pairs of Confusing Related Terms
document element and document root: For an XML document to be well-formed, the very last tag must be an end-tag corresponding to the start-tag that starts the whole thing -- in other words, there must be one single element enclosing all the other elements. We call that element the document element. If there is a DOCTYPE declaration, it names the element type of that document element. If you picture a tree of the document's elements, that element would be the root of that tree. But remember, there are other kinds of trees that can represent a document, such as a DOM tree or the source and result trees used in XSLT. These trees have their own root node, and the node representing the document element is a child of that root. This way, if the document has comments or processing instructions outside of the document element, they can still be represented as part of the document tree, as sibling nodes of the document element node.
expression and pattern: in XSLT, an XPath expression uses the XPath language to describe a set of nodes. Patterns, which specify a set of conditions that a node must meet, use a subset of XPath expression syntax that limit you to using the child and attribute axes. Because expressions are discussed more often, it can be confusing to see something like "wine[@year='1999']" referred to as a pattern when it looks like an XPath expression. It is an XPath expression in addition to being a pattern, but if it's being used as the match value of an xsl:template element or an xsl:key element, or as the count or from attribute of an xsl:number element, it's acting as a match pattern.
node and element, tree and document: computer programs use tree-like structures to represent a lot of things. In fact, there are several different ways that trees can represent the same XML document: to show its entity structure, to show its element structure, or as a Document Object Model (DOM) tree. In the trees used to store an XML document for processing with an XSLT stylesheet, the nodes, or components, can be element nodes, attribute nodes, text nodes, processing instruction nodes, comment nodes, or namespace nodes. For most documents, most of the nodes are element nodes, so nodes and elements can seem almost synonymous at times -- for example, to find a sibling element (an element with the same parent as the context node) you'll want a sibling node (a node with the same parent as the one in question) that happens to be an element node.
template and template rule: a template rule consists of two parts: a pattern matched against the source tree nodes and a template that is added to the result tree for the nodes that match that pattern. In an XSLT stylesheet, xsl:template elements represent template rules, the value of these elements' match attributes are the patterns to match against the source tree nodes, and the elements' content (the part between the xsl:template elements' start- and end-tags) are the templates. The fact that an xsl:template element doesn't represent an XSLT template, but instead represents a template rule, adds to the confusion.
XSLT elements and instructions: Using XSLT is about learning how to use the various elements from the XSLT namespace such as xsl:template, xsl:apply-templates, and xsl:output. XSLT elements such as xsl:apply-templates, xsl:text, and xsl:element that tell the XSLT processor to add something to the result tree are sometimes called instructions. Since most XSLT elements are instructions, the terms can sometimes seem to mean the same thing, but some XSLT elements aren't instructions. There are also "top-level" elements such as xsl:output and xsl:strip-space that give more general instructions to the XSLT processor about how to perform the transformation.
URL and URI: "URI" stands for Uniform Resource Identifiers, the system for naming resources on the Web. Web address URLs such as http://www.snee.com are the most common form of URIs. For now, URIs that aren't URLs are so rare that the terms "URI" and "URL" are practically synonymous.