Menu

Getting Loopy

August 1, 2001

Bob DuCharme

Programming languages use loops to execute an action or series of actions multiple times. After performing the last action of such a series, the program "loops" back up to the first one. It may repeat these actions a specific number of times—for example, five times or thirty times. It may repeat the actions until a specified condition is true—for example, until there's no more input to read or until a prime number greater than a million has been calculated. XSLT offers two ways to repeat a series of actions:

  • The xsl:for-each instruction lets you perform a group of instructions on a given set of nodes. The specification of those nodes can take full advantage of the options offered by XPath's axis specifiers, node tests, and predicate conditions. In other words, for any set of a source tree's nodes that you can describe with an XPath expression, there's a way to say "perform this set of actions on these nodes". While this provides a way to execute a set of instructions repeatedly, you're repeating them over a set of nodes, not for an arbitrary number of iterations, which is what a "for" loop does in most programming languages.

  • By having a named template call itself recursively with parameters, you can execute a series of instructions for a fixed number of times or until a given condition is true. This technique comes from one of XSLT's ancestors, LISP, a programming language developed for artificial intelligence work in the 1960s. It may be unfamiliar to programmers accustomed to the "for" and "while" loops available in procedural like Java, C++, and Visual Basic, but it can perform the same tasks.

Iteration Across Nodes with xsl:for-each

When do you need xsl:for-each? Less often than you might think. If there's something you need to do with (or to) a particular set of nodes, an xsl:template template rule may be the best way to do it. Reviewing this approach will make it easier to understand what the xsl:for-each instruction can offer us.

In an xsl:template template rule, you specify a pattern in the match attribute that describes which nodes you want the rule to act on. For example, let's say we want to list the figure titles in the following document. (The sample document, stylesheets, and output are available in this zip file.)

<chapter>
<para>Then with expanded wings he steers his flight</para>
<figure><title>"Incumbent on the Dusky Air"</title>
<graphic fileref="pic1.jpg"/></figure>
<para>Aloft, incumbent on the dusky Air</para>
<sect1>
<para>That felt unusual weight, till on dry Land</para>
<figure><title>"He Lights"</title>
<graphic fileref="pic2.jpg"/></figure>
<para>He lights, if it were Land that ever burned</para>
<sect2>
<para>With solid, as the Lake with liquid fire</para>
<figure><title>"The Lake with Liquid Fire"</title>
<graphic fileref="pic3.jpg"/></figure>
</sect2>
</sect1>
</chapter>

The following stylesheet adds only these title elements to the result tree. It suppresses the para elements, which are the only other elements that have character data.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes" indent="no"/>

  <xsl:template match="figure/title">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="para"/>

</xsl:stylesheet>

This stylesheet creates the following result from the document above:

"Incumbent on the Dusky Air"

"He Lights"

"The Lake with Liquid Fire"

Simple template rules aren't always enough to perform a series of actions on a specified set of nodes. What if you want to list the figure titles at the top of the result document and then output the rest of the source document under that list? A stylesheet, like the one above, that goes through the document adding only the figure's title elements to the result tree won't do this because the para elements need to be added as well. Our new stylesheet needs to add all the figure titles to the result tree as soon as it reaches the beginning of the source tree's chapter element, and then it must continue through the rest of the source tree processing the remaining nodes.

The xsl:for-each instruction is great for this. The following template rule has four children:

  • A text node with the string "Pictures:".

  • An xsl:for-each instruction to add the figure elements' title subelements to the result tree.

  • A text node with the string "Chapter: ".

  • An xsl:apply-templates element to add the chapter element's contents to the result tree.

<xsl:template match="chapter">
  <!-- Odd indenting to make result line up better -->
Pictures:
<xsl:for-each select="descendant::figure">
<xsl:value-of select="title"/><xsl:text>
</xsl:text>
  </xsl:for-each>

Chapter:<xsl:apply-templates/>
</xsl:template>

The xsl:for-each element's select attribute indicates which nodes to iterate over. The XPath expression used for this attribute value has an axis specifier of "descendant" and a node test of "figure", so taken together, the expression means "all the descendants of the chapter node (the one named in the template's match pattern) with 'figure' as their node name". This demonstrates a key advantage of the descendant axis over the default child one: the descendant::figure XPath expression gets the title element nodes from the chapter element's child, grandchild, and great-grandchild figure elements.

The contents of the xsl:for-each element consists of two things to add to the result tree for each of the nodes that the xsl:for-each iterates over:

  • The xsl:value-of element adds the content of each figure element's title child.

  • The xsl:text element with a single carriage return as its content adds that carriage return after each title value.

The following is the result:

  
Pictures:
"Incumbent on the Dusky Air"
"He Lights"
"The Lake with Liquid Fire"

Chapter:
Then with expanded wings he steers his flight
"Incumbent on the Dusky Air"

Aloft, incumbent on the dusky Air

That felt unusual weight, till on dry Land
"He Lights"

He lights, if it were Land that ever burned

With solid, as the Lake with liquid fire
"The Lake with Liquid Fire"

Rearranging a document's structure as you copy it to the result tree is one of the most popular uses of XSLT. Thus the xsl:for-each element is particularly valuable because of its ability to grab a copy of a set of nodes that aren't located together in the source tree, perform changes on those nodes, and then put them together in the result tree wherever you like.

Another advantage of acting on a set of nodes with an xsl:for-each element instead of with an xsl:template element lies in a limitation of template rules that XSLT novices often don't notice. While it may appear that you can use XPath expressions in an xsl:template element's match attribute, you're actually limited to the subset of XPath expressions known as patterns. In the xsl:for-each element's select attribute, however, you have the full power of XPath expressions available.

For example, you can't use the ancestor axis specifier in match patterns, but you can in an xsl:for-each element's select attribute. The following template uses it to list the names of all of a title element's ancestors.

<xsl:template match="title">
  <xsl:text>title ancestors:</xsl:text> 
  <xsl:for-each select="ancestor::*">
   <xsl:value-of select="name()"/>
    <!-- Output a comma if it's not the last one in
         the node set that for-each is going through. -->
   <xsl:if test="position() != last()">
   <xsl:text>,</xsl:text>
   </xsl:if>
  </xsl:for-each>
</xsl:template>

<xsl:template match="para"/>

The second template rule suppresses para elements from the result tree. The first template's "title ancestors:" and "," text nodes are inside of xsl:text elements to prevent the adjacent carriage returns from being copied to the result. This way, each title element's ancestor list is on one line right after the title introducing it.

This stylesheet outputs the following when applied to the document we saw earlier:

title ancestors:chapter,figure

title ancestors:chapter,sect1,figure

title ancestors:chapter,sect1,sect2,figure
    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

Like the xsl:value-of instruction, xsl:for-each is a great way to grab some set of nodes from the source tree while the XSLT processor is applying a template to any other node. The xsl:value-of element has one crucial limitation that highlights the value of xsl:for-each: if you tell xsl:value-of to get a set of nodes, it only returns a string version of the first one in that set. If you tell xsl:for-each to get a set of nodes, it gets the whole set. As you iterate across that set of nodes you can do anything you want with them. (The xsl:copy-of instruction can also grab nearly any set of nodes, but with xsl:for-each you can do something with them before copying them to the result tree, like the addition of the text nodes in the example above.)

Note The xsl:sort instruction lets you sort the node set that your xsl:for-each element is iterating through.

Arbitrary Repetition with Named Template Recursion

XSLT offers no specific element for repeating a group of instructions an arbitrary number of times or until a given condition is true. To understand why requires a little historical background.

Just as XML got its start as a simplified version of SGML, XSLT and XSL began as simplified, XML-based versions of a stylesheet language developed for SGML called DSSSL (Document Style Semantics and Specification Language -- rhymes with "whistle"). Like SGML, DSSSL is an ISO standard, but its actual use in the real world has been limited.

Why didn't it catch on? One problem was its descent from Scheme, a programming language in the LISP family. "LISP" stands for "LISt Processing" language, but some programmers say that it stands for "Lots of Irritating Silly Parentheses". LISP, Scheme, and DSSSL use parentheses heavily for syntax, and the parenthesized expressions get nested at such deep levels that expressions ending with "))))" are common in all three languages. Both data structures and control structures are parenthesized expressions in these languages, and this can make code difficult to read, especially without using a LISP-aware editor like Emacs.

XSL and XSLT remedy this by applying many of DSSSL's principles using XML. Expressions can be deeply nested, but instead of being nested within dozens of parentheses and indentation conventions, they're nested inside of regular XML elements that have names right in their tags describing their purpose -- for example, xsl:if, xsl:number, and xsl:comment. This makes XSLT stylesheets more readable than DSSSL stylesheets for many people.

XSLT did inherit a related aspect of its ancestors that some view as a blessing and others as a curse: there's no concept of a series of instructions being executed sequentially. (The technical term is that XSLT is a "side effect free" language.) A stylesheet gets applied to a source tree to create a result tree, and while the structure of the result tree is important, the order in which it's created is irrelevant. Since you can't have a series of instructions, you certainly don't have a way to repeat a series of instructions.

This doesn't prevent you from doing something a specific number of times or until a given condition is true in XSLT. You just have to do so using recursion, a fundamental LISP/Scheme/DSSSL technique Using recursive named templates, you can get the benefits of both "for" loops and "while" loops.

To demonstrate this, we'll use this simple input document:

<sample>the facile gates of hell too slightly barred</sample>

The following template rule shows how to repeat something a specific number of times. It has a named template called "hyphens" that can call itself as many times as necessary to add the specified number of hyphens to the result tree. To demonstrate the use of this named template, the "sample" template calls the "hyphens" template four times, asking it to add a different number of hyphens to the result tree each time. First, the "sample" template calls it with no value overriding the howMany parameter so that we can see the template's default behavior, and then it calls the template three more times with the values 3, 20, and 0 to override the parameter's default value of 1.

<xsl:template name="hyphens">
  <xsl:param name="howMany">1</xsl:param>
  <xsl:if test="$howMany &gt; 0">

    <!-- Add 1 hyphen to result tree. -->
    <xsl:text>-</xsl:text>  

    <!-- Print remaining ($howMany - 1) hyphens. -->
    <xsl:call-template name="hyphens">
    <xsl:with-param name="howMany" select="$howMany - 1"/>
    </xsl:call-template>
  </xsl:if>
</xsl:template>
  
<xsl:template match="sample">

  Print 1 hyphen: 
  <xsl:call-template name="hyphens"/>

  Print 3 hyphens: 
  <xsl:call-template name="hyphens">
    <xsl:with-param name="howMany" select="3"/>
  </xsl:call-template>

  Print 20 hyphens: 
  <xsl:call-template name="hyphens">
    <xsl:with-param name="howMany" select="20"/>
  </xsl:call-template>

  Print 0 hyphens: 
  <xsl:call-template name="hyphens">
    <xsl:with-param name="howMany" select="0"/>
  </xsl:call-template>

</xsl:template>

This creates the following result:

  Print 1 hyphen: 
  -

  Print 3 hyphens: 
  ---

  Print 20 hyphens: 
  --------------------

  Print 0 hyphens: 
  

The "hyphens" named template first declares the howMany parameter with an xsl:param element that sets this parameter's default value to 1. The rest of the template is a single xsl:if element whose contents get added to the result tree if howMany is set to a value greater than zero. If howMany passes this test, a single hyphen is added to the result tree and an xsl:call-template instruction calls the "hyphens" named template with a value of howMany that is one less than its current setting. If howMany is set to 1, xsl:call-template calls it with a value of 0, so no more hyphens will be added to the result tree. If howMany is set to 3, the named template will be called with a howMany value of 2 after adding the first of the 3 requested hyphens, and the process will be repeated until the template is called with a value of 0.

Recursion: Multiple calls to the same named template

    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

This is what we mean by "recursion". When a template calls itself, it's a recursive call. You don't want it to call itself forever, so the recursive template needs a terminating condition -- in the case above, an xsl:if element that won't perform the recursive call unless howMany is greater than 0.

You must also be sure that the terminating condition will eventually occur. If the terminating condition was "$howMany = 0" and the recursive call subtracted 2 from the current value of howMany before calling the "hyphens" template again, calling it with a value of 3 would then mean making subsequent calls with howMany values of 1, -1, -3, -5, and so forth without ever hitting 0. The recursive calls would never stop. (The actual outcome of such an endless loop depends on the XSLT processor being used.)

Note The fancier the condition you use to control recursion, the more careful you must be to make absolutely sure that it will occur. Otherwise, the execution of your stylesheet could get stuck there.

The example above simulates the "for" loops used by many other programming languages because its recursive template performs an action a specific number of times, with the exact number passed to it at runtime. A "while" loop typically repeats an action or actions as long as a certain condition is true, and guess what -- the example above is really a "while" loop. The condition is the "$howMany &gt; 0" test in the named template's xsl:if start-tag. You can put any condition you want there, and the template will make recursive calls to itself as long as the condition is true.