The Path of Control

May 4, 2005

Just as XPath 1.0 was part of XSLT 1.0, XPath 2.0 plays an important role in XSLT 2.0, and all the new features of XPath 2.0 will be available to XSLT 2.0 stylesheet developers. In earlier columns on XSLT 2.0, I reviewed its new data model and some of its new functions, and this month I'm going to cover XPath's new ability to do some things that every real programming language can do: conditional statements and iteration, or, as they're more colloquially known, "if" statements and "for" loops. We'll also look at a useful related technique for checking whether certain conditions do or don't exist in a set of nodes.

To be honest, I had some difficulty understanding exactly what XPath's if statements and for loops brought to XSLT. I could think of examples to demonstrate their use, but the examples were either toy examples with no real utility or implementations of tasks that I could accomplish just fine with XSLT 1.0 and XPath 1.0. I began to suspect that these were added to XPath more for the benefit of XQuery 1.0, a W3C specification that also depends heavily on XPath 2.0, than for XSLT 2.0. With some help from the members of the Saxon mailing list, particularly Dave Pawson and Michael Kay, I eventually saw that conditional expressions and for loops make a real contribution to XSLT 2.0. I also found an XSL-List posting with a position paper that Michael wrote for the W3C XSL Working Group to evaluate how badly XPath needed these constructs, and I recommend it to anyone interested in some theoretical background on the role that XPath conditional and iteration statements can play in XSLT. (By "theoretical," I don't mean "highly abstract"; I just mean that the paper provides a good "big picture" of the potential fit of all the pieces of the problem.) Several of my examples here are based on examples in that paper.

Conditional Expressions

Conditional expressions let you evaluate a condition and only take action if the condition is true. In most programming languages, a conditional expression also lets you specify another action to take if the condition is false; it's always been a bit annoying that XSLT 1.0's xsl:if instruction had no corresponding xsl:else instruction. You can get the same effect with an xsl:choose element that has one xsl:when child and an xsl:otherwise child, but this is pretty verbose, even for XSLT.

An XPath 2.0 if expression lets you pack a lot (even an else condition!) into a pretty small space. In the following template rule, a conditional expression in an attribute value template for the font element's color attribute sets its value to "blue" if the cost element's value is greater than or equal to zero and "red" otherwise:

<xsl:template match="cost">
  <font color="{if (. &gt;= 0) then  'blue' else 'red'}">
    <xsl:apply-templates/>
  </font>
</xsl:template>

The else part of an XPath if expression is not only available, it's required, because the entire expression must return something whether the condition is true or not. (As the opening of the XPath 2.0 spec tells us, "XPath 2.0 is an expression language.") The parentheses are also required.

The spec's syntax for conditional expressions shows ExprSingle where the example above has the strings "blue" and "red". The spec's production for ExprSingle shows that it can be an if expression in and of itself, letting us build some very complex conditional expressions. In the following modification of the template rule from above, I've replaced the original "blue" and "red" strings with new if expressions:

<xsl:template match="cost">
  <font color="{if (. &gt;= 0) then
                   if (. &gt; 50) then 
                      'blue' 
                   else 
                      'lightblue' 
                else 
                   if (. &lt; -50) then 
                      'red' 
                   else 
                      'pink'}">
    <xsl:apply-templates/>
  </font>
</xsl:template>

Remember, there's nothing wrong with some extra white space in an XPath expression, and while a color attribute value of "{if (. >= 0) then if (. > 50) then 'blue' else lightblue' else if (. < -50) the 'red' else 'pink'}"above would work just as well, it's much more difficult to follow the control flow when trying to read it.

The expressions returned by XPath 2.0 if expressions can be node sets, which is where they really move beyond the capabilities of the XPath 1.0 xsl:if instruction. For example, when the following template rule finds a checkBook element, it sums up all the values of its credit element children if the $depositReport variable is equal to a Boolean true, and it sums up the debit children otherwise:

<xsl:template match="checkBook">
  <xsl:value-of select="sum(if ($depositReport) then credit else debit)"/>
</xsl:template>

For Expressions

Many programming languages offer a choice of expressions that execute instructions a specific number of times, but a for loop (named for the keyword that usually begins it) is the most common. XSLT 1.0's xsl:for-each instruction is different: it lets you iterate across a set of nodes, performing one or more action on each. To execute something a specific number of times, XSLT 1.0 makes you use the recursion techniques of its ancestors Scheme and LISP. (For a comparison and demonstration of these two techniques, see my earlier column Getting Loopy.)

XPath 2.0's for expressions (and XSLT 2.0's xsl:for-each instructions) let you iterate across a set of nodes. They also let you iterate across a range of numbers, executing anything you like a specific number of times, and returning a sequence. A for expression always returns a sequence; the one in the following returns the numbers 2, 4, 6, 8, and 10:

  <xsl:value-of select="for $i in (1 to 5)  return $i * 2"/>

Making use of the individual members of the returned sequence would require a for-each loop to iterate through that sequence, so in many cases the XPath for loop doesn't necessarily mean tighter code in your stylesheet. However, when you pass such a sequence (especially a sequence created from a document's nodes, as opposed to a range of numbers like in the example above) to a function that can operate on it, the power of the XPath 2.0 for expression becomes more apparent. (Don't forget that XPath 2.0 lets you write your own new functions in your stylesheet, which amplifies this power even more.) For example, imagine that you have the following source document:

<items>
  <item price="14" quantity="4">fountain</item>
  <item price="9" quantity="5">bottle rack</item>
  <item price="8" quantity="2">bicycle wheel</item>
  <item price="10" quantity="3">50 cc of Paris air</item>
</items>

The following template rule, upon finding the items element, multiplies the price and quantity attribute values for each item element and sums up the results:

<xsl:template match="items">
  <xsl:value-of select="sum(for $i in item return $i/@price * 
     $i/@quantity)"/>
</xsl:template>

The identification of the nodes to iterate over need not be so simple. For example, we could add the predicate shown in the following if we only want to total up the cost of buying items with a price value below 10:

<xsl:template match="items">
  <xsl:value-of select="sum(for $i in item[@price &lt; 10] return $i/@price *
     $i/@quantity)"/>
</xsl:template>

Quantified Expressions

Quantified expressions let you do existential quantification (checking whether some condition is true for some member of a set) and universal quantification (checking whether some condition is true for all members of a set). Because a quantified expression returns a Boolean value, we can use it as the test condition in an XPath 2.0 if condition, or even in a regular XSLT xsl:if or xsl:when test.

When the single template rule in the following stylesheet finds the items element in the sample document above, it checks that no more than $100 is being spent on any item by multiplying the quantity times the price for each item and then comparing the result with the number 100. If, as the syntax says, every item satisfies that condition, the if test returns the string "approved". Otherwise it returns the string "rejected".

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="2.0">

  <xsl:template match="items">

    <xsl:variable name="amountsOK"
                  select="every $c in item satisfies
                          ($c/@price * $c/@quantity &lt; 100)"/>

    <xsl:value-of select="if ($amountsOK) then 'approved'
                          else 'rejected'"/>
  </xsl:template>

</xsl:stylesheet>

To demonstrate the use of existential quantification, we can get the same effect by checking for the opposite. If the test described in the foundBadPurchase variable in the next stylesheet finds one item for which price times quantity is greater than or equal to 100, the some...in...satisfies expression returns a Boolean true, and the if condition rejects the purchase.

<xsl:template match="items">

  <xsl:variable name="foundBadPurchase"
                select="some $c in item satisfies
                        ($c/@price * $c/@quantity &gt;= 100)"/>

  <xsl:value-of select="if ($foundBadPurchase) then 'rejected'
                        else 'approved'"/>
</xsl:template>

The syntax for for expressions and the syntax for quantified expressions both have our friend ExprSingle in several places. We already saw that the definition for ExprSingle lets us use an if expression as an ExprSingle; it also shows that a for expression and a quantified expression both qualify as an ExprSingle. This lets us go well beyond the incorporation of if expressions inside of if expressions: you can embed if expressions, quantified expressions, and for expressions inside of each other to whatever depth you wish. (Michael Kay's position paper mentioned above describes the desire of some, when this was all being designed, to restrict the use of conditional expressions to the top level--"whatever that means", as he so elegantly put it--and points out what a mess this could have caused.)

If you pursue ambitious levels of complexity with your combinations of these expressions, remember my earlier point about extra white space having no effect on the execution of an XPath expression. If you're going to get complex, make it readable. Another classic coding trick for enhancing readability is to store a moderately complex expression in a variable, as I did with $amountsOK and $foundBadPurchase in the last two examples, and then referencing those variables from other expressions, instead of putting all the complexity in one place. These principles certainly aren't exclusive to XSLT, but if you've ever heard people complain about the readability of XSLT stylesheets, then you'll know that this extra effort is worthwhile.