XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Axis Powers: Part One

December 20, 2000

XPath expressions play an important role in XSLT because they let stylesheet instructions flexibly identify the parts of the input document -- or, in XSLT terms, the nodes of the source tree -- to act on. Let's review the basics. An XPath expression consists of one or more location steps separated by slashes. Each step consists of an axis specifier, a node test, and an optional predicate. For example, the one-step XPath expression following-sibling::para[3] uses an axis of following-sibling, a node test of para, and a predicate of [3] to identify the third para element of the nodes in the context sibling's following-sibling axis. (The context node refers to the source tree node that the XSLT processor is currently dealing with -- usually because it's one of the nodes named by the xsl:template element's match condition or by an xsl:for-each instruction's select attribute.) The two-step XPath expression child::wine/child::prices contains the prices children of any wine children of the context node. In this two-part column, we'll examine the various axes you can use and their potential role in XSLT transformations.

A location step's axis describes the selected nodes' relationship to the context node in terms of where they are on the tree. For example, in the location step child::wine, the child axis part tells an XSLT processor to look at the child nodes of the context node, and the wine node test part tells it the name of the child nodes it's interested in. Besides child, other available tree relationships are descendant, parent, ancestor, following-sibling, preceding-sibling, following, preceding, attribute, namespace, self, descendant-or-self, and ancestor-or-self.

Despite the singular form of axis specifiers names like "ancestor" and "preceding-sibling", only "self" and "parent" always refer to a single node. The others might be more aptly named "children", "ancestors", "preceding-siblings", and so forth, so that's how you should think of them: as ways of accessing those particular sets of nodes. The node test and predicate parts of a location step let you select a subset of the group of nodes that a particular axis specifier points to.

The child, parent, and attribute Axes

Let's say that when processing the prices element in the following, you want to look up the grape attribute value of the prices element's parent element.

<wine grape="Cabernet Sauvignon">
    <winery>Los Vascos</winery>
    <year>1998</year>
    <prices>
      <list>13.99</list>
      <discounted>11.99</discounted>
      <case>143.50</case>
    </prices>
</wine>

The following template tells the XSLT processor, "As you traverse the source tree, when you find a prices element node, use xsl:value-of to add a certain value to the result tree. To get that value, first go to the parent node named wine and then go to its attribute node named grape." (The comment in each sample stylesheet identifies filenames of the stylesheet, the sample input, and output in the zip file of sample code.)

<!-- u85-7-2.xsl: converts u85-7-3.xml into u85-7-3a.xml -->

<xsl:template match="prices">
  parent element's grape:
  <xsl:value-of select="parent::wine/attribute::grape"/>
</xsl:template>

Each of the two steps in the select attribute's location path has both an axis specifier (parent and attribute) and a node test (wine and grape).

This template rule creates the following result from the wine XML document above:

    Los Vascos
    1998
    
  parent element's grape:
  Cabernet Sauvignon

The parent and attribute axes seem handy. Why do you see them so rarely when you look at XSLT stylesheets? Because they're so handy that XSLT offers abbreviations for them. The at-sign (@) abbreviates attribute::, and two periods (..) abbreviate parent::node(). (The node() node test points to the current node.) Knowing this, you could write the template rule above, with the exact same effect, like this:

<xsl:template match="prices">
  parent element's grape:
  <xsl:value-of select="../@grape"/>
</xsl:template>

The most abbreviated abbreviation is the one for the child axis: if no axis at all is specified in an XPath location step, an XSLT processor assumes that child is the axis. For example, take a look at the following template, which (among other things) plugs in the value of the wine element's year child as the value of the vintage attribute in the result tree version. (The curly braces tell the XSLT processor to evaluate the string "child::year" as an expression -- that is, to figure out what it's supposed to represent instead of putting that actual string as the value of the vintage elements.)

<!-- u85-7-6.xsl -->

<xsl:template match="wine">
  <wine vintage="{child::year}">
    <xsl:apply-templates select="product"/>
    <category><xsl:value-of select="@grape"/></category>
    <xsl:apply-templates select="price"/>
  </wine>
</xsl:template>

Written like this (with the child axis specification removed) it will have the same effect and be more compact:

<!-- u85-7-5.xsl -->

<xsl:template match="wine">
  <wine vintage="{year}">
    <xsl:apply-templates select="product"/>
    <category><xsl:value-of select="@grape"/></category>
    <xsl:apply-templates select="price"/>
  </wine>
</xsl:template>

ancestor and ancestor-or-self

The ancestor axis is useful when you want to apply special treatment to elements that are somewhere inside of another element but you're not sure how many levels down. For example, let's say you want to format your para elements differently when they're in an appendix element. These para elements may be children of an appendix element, or children of sect elements inside of an appendix element, or children of subsect or warning elements inside of a section element. The para template rule below uses the ancestor axis to add one set of markup to the result tree if the para source node element has an appendix element as an ancestor, and another if it has a chapter element as an ancestor.

<!-- u85-8-1.xsl -->

<xsl:template match="para">

  <xsl:if test="ancestor::appendix">
    <p><font face="arial"><xsl:apply-templates/></font></p>
  </xsl:if>

  <xsl:if test="ancestor::chapter">
    <p><font face="times"><xsl:apply-templates/></font></p>
  </xsl:if>

</xsl:template>

Why would you want to use the ancestor-or-self axis? When you want to check the value of an attribute that may be in the current element or one of its ancestors. For example, the XML specification describes the xml:lang attribute, which indicates the spoken language of an element and all of its descendants that don't have their own xml:lang attribute. To check whether an element has a particular language specified for it, you could check that element, then check its parent, then check its parent's parent, and so on, or you could use the ancestor-or-self axis like

<!-- u85-8-2.xsl: converts u85-8-3.xml into u85-8-3a.xml -->

<xsl:template match="warning">

  <xsl:if test="ancestor-or-self::*/attribute::xml:lang='en'">
    <p><b>Warning! </b><xsl:apply-templates/></p>
  </xsl:if>

  <xsl:if test="ancestor-or-self::*/attribute::xml:lang='de'">
    <p><b>Achtung! </b><xsl:apply-templates/></p>
  </xsl:if>

</xsl:template>

Each of the two xsl:if elements has a two-step location path. If the first step said ancestor-or-self::warning, it would check the current node and its ancestors and select any of those nodes named warning. Instead it uses the asterisk to select the ancestor-or-self nodes with any name, and then the location path's second step checks for the first node in that list that has a value of "en" or "de" for its attribute value. (Substituting @ for attribute:: in these location paths would have the same effect, because that's exactly what this abbreviation is for.) When the current node or its closest ancestor with an xml:lang attribute has a value of "en" for that attribute, the first xsl:if instruction adds the English string "Warning!" to the result tree. If that xml:lang attribute has a value of "de", the second xsl:if statement adds the German "Achtung!".

With the following document in the source tree,

<chapter>
  <sect xml:lang="de">
    <warning>Make a backup first.</warning>
  </sect>
</chapter>

the template above adds the string "Achtung!" at the start of the result tree's version of the warning element whether the xml:lang attribute was part of the warning, sect, or chapter start-tags:

    <p><b>Achtung! </b>Make a backup first.</p>

preceding-sibling and following-sibling

The term "sibling" refers to another node with the same parent as the context node. The preceding-sibling axis refers to all the siblings before the context node, and following-sibling refers to all the siblings after it. For example, the following template copies chapter elements from the source tree to the result tree and uses these two axis specifiers to add a message about the preceding and following chapters at the beginning of each chapter.

<!-- u85-8-4.xsl: converts u85-8-5.xml into u85-8-6.xml -->

<xsl:template match="chapter">
  <chapter>
  Previous chapter: 
  (<xsl:value-of select="preceding-sibling::chapter[1]/title"/>)
  Next chapter: 
  (<xsl:value-of select="following-sibling::chapter/title"/>)
  <xsl:text>
  </xsl:text>
  <xsl:apply-templates/>
  </chapter>
</xsl:template>

Understanding how will be easier if we first see the effect this has. It turns this source document

<story>

  <chapter><title>Chapter 1</title>
    <par>A Dungeon horrible, on all sides round</par>
  </chapter>

  <chapter><title>Chapter 2</title>
    <par>More unexpert, I boast not: them let those</par>
    <par>Contrive who need, or when they need, not now.</par>
    <sect><title>Chapter 2, Section 1</title>
      <par>For while they sit contriving, shall the rest,</par>
      <par>Millions that stand in Arms, and longing wait</par>
    </sect>
  </chapter>

  <chapter><title>Chapter 3</title>
    <par>So thick a drop serene hath quenched their Orbs</par>
  </chapter>

</story>

into this:


  <chapter>
  Previous chapter: 
  ()
  Next chapter: 
  (Chapter 2)
  
  Chapter 1
    A Dungeon horrible, on all sides round
  </chapter>

  <chapter>
  Previous chapter: 
  (Chapter 1)
  Next chapter: 
  (Chapter 3)
  
  Chapter 2
    More unexpert, I boast not: them let those
    
    Contrive who need, or when they need, not now.
    Chapter 2, Section 1
      For while they sit contriving, shall the rest,
      
      Millions that stand in Arms, and longing wait
    
  </chapter>

  <chapter>
  Previous chapter: 
  (Chapter 2)
  Next chapter: 
  ()
  
  Chapter 3
    So thick a drop serene hath quenched their Orbs
  </chapter>

Each xsl:value-of element in the template rule has a two-step XPath expression as its select attribute. For the second, the two steps are following-sibling::chapter and title, telling the XSLT processor "go to the sibling node named chapter that is after the context node (which is also named chapter, as we see from the xsl:template element's match attribute) and grab the value of its title child element." Although following-sibling can refer to multiple nodes, the xsl:value-of element only adds a text version of the first one to the result tree. For the first chapter in the source document, the second chapter is that first node; for the second chapter, the third chapter is; and for the third chapter, there is no such node, so nothing appears between the parentheses after the final "Next chapter" in the result document.

The first xsl:value-of element in the example template resembles the second, except that we don't want xsl:value-of to get a text version of the first node in the set of nodes that preceding-sibling::chapter points to. Chapter 3 has two preceding siblings, but the first of those is not the preceding chapter. To tell it to grab the preceding sibling just before the context node, the location step includes the predicate [1], telling the XSLT processor "get the first one as you count along these nodes." This may seem confusing, because we're adding the number "1" to show that we don't want the first node, but we don't want the first one in document order; we want the first one counting backwards -- the one preceding the context node. The XSLT processor counts backwards through a node set when you add a number predicate to an XPath expression using the preceding-sibling, preceding, ancestor, or ancestor-or-self axes. Just as your parent is your first ancestor and your grandparent is your second ancestor, your first preceding sibling is the one just before you and your second preceding sibling is the one before that. The XSLT processor counts forward, in document order, for any other axis.

In the example, the two steps of the preceding-sibling::chapter[1]/title XPath expression say "go to the first preceding sibling named chapter, then get its title element's contents." The first chapter has no preceding chapter sibling, so nothing shows up in the parentheses after the first "Previous chapter" text in the result.

In the next column, we'll look at the remaining axes: preceding, following, descendant, descendant-or-self, self, and namespace. Meanwhile, you can play with the examples shown in this column by downloading the zip file available here.