XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Understanding the node-set() Function

July 16, 2003

The XSLT language is capable of achieving many tasks, but some surprisingly trivial requirements, such as calculating the total amount of an invoice, cannot be expressed in a straightforward way. This article describes how you can get round this by using a very powerful extension function in your stylesheets: the node-set() function.

In XSLT you can assign to a variable any XPath data type. For example, to store all books from a catalog in a variable for further processing, you can use the following instruction:

<xsl:variable name="books" select="//book"/>

The variable $books now contains a set of nodes. Thus you can use this variable in other XPath expressions without any limitations. For example, you can use the expression $books/title to get the titles of all books from the catalog.

So far, so good, but XSLT added a new data type called "result tree fragment" into XPath. You can imagine a result tree fragment (RTF) as a fragment or a chunk of XML code. You can assign a result tree fragment to a variable directly, or result tree fragment can arise from applying templates or other XSLT instructions. The following code assigns a simple fragment of XML to the variable $author.

<xsl:variable name="author">
  <firstname>Jirka</firstname>
  <surname>Kosek</surname>
  <email>jirka@kosek.cz</email>
</xsl:variable>

Now let's say we want to extract the e-mail address from the $author variable. The most obvious way is to use an expression such as $author/email. But this will fail, as you can't apply XPath navigation to a variable of the type "result tree fragment."

If we want to get around this limitation, we can use an extension function which is able to convert a result tree fragment back to a node-set. This function is not a part of the XSLT or XPath standards; thus, stylesheets which use it will not be as portable as ones which don't. However, the advantages of node-set() usually outweigh portability issues.

Extension functions always reside in a separate namespace. In order to use them we must declare this namespace as an extension namespace in our stylesheet. The namespace in which the node-set() function is implemented is different for each processor, but fortunately many processors also support EXSLT, so we can use the following declarations at the start of our stylesheet.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:exsl="http://exslt.org/common"
                extension-element-prefixes="exsl"
                version="1.0">
  ...
  <!-- Now we can convert result tree fragment back to node-set -->
  <xsl:value-of select="exsl:node-set($author)/email"/>
  ...
</xsl:stylesheet>

The expression exsl:node-set($author) converts the result tree fragment to a node-set; we can take it as a start for further XPath navigation. If our processor is not EXSLT-aware we must change the namespace http://exslt.org/common according to Table 1.

Table 1. Support for node-set() in XSLT processors

Processor Function name Namespace
EXSLT aware processors (Saxon, xsltproc, Xalan-J, jd.xslt, 4XSLT) node-set() http://exslt.org/common
MSXML node-set() urn:schemas-microsoft-com:xslt
Xalan-C nodeset() http://xml.apache.org/xalan
Sablotron Can operate on result tree fragments directly

After this rather theoretical introduction, I will now show you how you can use node-set() for something more useful.

Sum of Products -- Invoice Processing

Let's suppose that we want to create a stylesheet that is able to render a simple XML invoice into nice HTML for further browsing and printing. For the sake of simplicity our invoice contains just items, each item has a description, ordered quantity and unit price.

<?xml version="1.0" encoding="utf-8"?>
<invoice>
  <item>
    <description>Pilsner Beer</description>
    <qty>6</qty>
    <unitPrice>1.69</unitPrice>
  </item>
  <item>
    <description>Sausage</description>
    <qty>3</qty>
    <unitPrice>0.59</unitPrice>
  </item>
  <item>
    <description>Portable Barbecue</description>
    <qty>1</qty>
    <unitPrice>23.99</unitPrice>
  </item>
  <item>
    <description>Charcoal</description>
    <qty>2</qty>
    <unitPrice>1.19</unitPrice>
  </item>
</invoice>

We don't want to be responsible for putting a damper on the party, so we will write a stylesheet for turning this XML into HTML. However, there is one complication: the rendered invoice should certainly contain the total amount. This might look like a simple task, but it will quickly become apparent that XPath and XSLT will fail here. XPath provides us with the sum() function, but it is only possible to sum values of nodes, and in our example we want to calculate a sum of subtotals (qty * unitPrice), which are not present in the source XML and thus are not accessible to XPath's sum(). The only pure XSLT 1.0 solution is to use recursive processing, which leads to code that is not very clear and easy to understand. (A pure XSLT solution is presented in invoice-noext.xsl stylesheet in the ZIP archive with all examples.)

The whole task will be much easier if we decide to utilize the node-set() function. In the first step we calculate subtotals for each item and store them as a fragment of XML.

<xsl:variable name="subTotals"> 
  <xsl:for-each  select="invoice/item">
    <number><xsl:value-of select="qty * unitPrice"/></number>
  </xsl:for-each>
</xsl:variable>

The variable $subTotals now holds subtotals, where each subtotal is marked-up with a number element.

<number>10.14</number>
<number>1.77</number>
<number>23.99</number>
<number>2.38</number>

Now we can get the total invoice amount quite easily by summing up values stored in number nodes: sum(exsl:node-set($subTotals)/number).

Here is a complete working stylesheet.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:exsl="http://exslt.org/common"
                extension-element-prefixes="exsl"
                version="1.0">

<xsl:template match="/">
  <html>
    <head>
      <title>Invoice</title>
    </head>
    <body>
      <h1>Invoice</h1>
      
      <!-- Format invoice items as a table -->
      <table border="1" style="text-align: center">
        <tr>
          <th>Description</th>
          <th>Quantity</th>
          <th>Unit price</th>
          <th>Subtotal</th>
        </tr>
        <xsl:for-each select="invoice/item">
          <tr>
            <td><xsl:value-of select="description"/></td>
            <td><xsl:value-of select="qty"/></td>
            <td><xsl:value-of select="unitPrice"/></td>
            <td><xsl:value-of select="qty * unitPrice"/></td>
          </tr>
        </xsl:for-each>
        <tr>
          <th colspan="3">Total</th>
          <th>
            <!-- Gather subtotals into variable -->
            <xsl:variable name="subTotals">
              <xsl:for-each select="invoice/item">
                <number>
                  <xsl:value-of select="qty * unitPrice"/>
                </number>
              </xsl:for-each>
            </xsl:variable>

            <!-- Sum subtotals stored as a result tree fragment 
                 in the variable -->
            <xsl:value-of 
              select="sum(exsl:node-set($subTotals)/number)"/>
          </th>
        </tr>
      </table>
    </body>
  </html>
</xsl:template>
  
</xsl:stylesheet>

Multipass Processing

Multipass processing is another situation in which the node-set() function is essential. In some situations it is hard to do the transformation in a single step; some post-processing on the result is needed. If we want to do this during a single transformation without the need for storing a temporary result, and without the need for repeated invocation of the XSLT processor, we must capture the result of the first transformation in a variable as a result tree fragment (RTF), convert the RTF to a node-set, and feed this node-set to templates which are responsible for post-processing.

We can demonstrate this technique on a very simple but real problem. Suppose that we must change an existing stylesheet to display a small image before each external link, in order to inform the user that an Internet connection is needed to traverse the link. The conventional approach to solving this task is to change the existing stylesheet to emit icons in appropriate places. But in the case of a very complex stylesheet this can be very time consuming work.

Our approach will give up on modifying the existing stylesheet. Instead we will capture its output and we will modify links in the captured output. In order to capture the output of other stylesheets we must import the stylesheet, and in the template for root node we must invoke the original templates using xsl:apply-imports inside a variable definition.

<xsl:variable name="content">
  <xsl:apply-imports/>
</xsl:variable>

The variable $content now holds the complete output from the original stylesheet. In this output we must change all occurrences of external links such as:

<a href="http...">text</a>

to

<a href="http..."><img src="external.gif" width="16" 
                       height="16" border="0">&nbsp;text</a>

All other text and markup should retain untouched. To copy the XML tree without modification we can use a very simple template that copies all element, attribute and text nodes.

<xsl:template match="@*|*|text()">
  <xsl:copy>
    <xsl:apply-templates select="@*|*|text()"/>
  </xsl:copy>
</xsl:template>

A second template is needed to process external links in a different way. As this template will match against a named element it has a higher priority than the previous copy-only template and will override it.

<xsl:template match="a[starts-with(@href,'http')]">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <img src="external.gif" width="16" height="16" border="0"/>
    <xsl:text>&#160;</xsl:text>
    <xsl:apply-templates select="*|text()"/>
  </xsl:copy>
</xsl:template>

Note that we must copy the original attributes for element a before inserting the image, otherwise the attributes will be appended to the wrong place.

In the final stylesheet, former templates must be in their own mode to prevent conflicts with the original stylesheet. To show that there are real situations where you don't have enough time to get to grips with other's work I'm using the DocBook XSL stylesheets as my original stylesheet. You can process any valid DocBook document with our final stylesheet.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:exsl="http://exslt.org/common"
                extension-element-prefixes="exsl"
                version="1.0">

<!-- Import original stylesheet -->
<xsl:import 
  href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"/>

<xsl:template match="/">
  <!-- Grab result of original stylesheet -->
  <xsl:variable name="content">
    <xsl:apply-imports/>
  </xsl:variable>

  <!-- Pass grabbed content to postprocessing templates -->
  <xsl:apply-templates select="exsl:node-set($content)"
                       mode="decoratelinks"/>
</xsl:template>

<!-- Default postprocessing is just copying of nodes -->
<xsl:template match="@*|*|text()" mode="decoratelinks">
  <xsl:copy>
    <xsl:apply-templates select="@*|*|text()" 
                         mode="decoratelinks"/>
  </xsl:copy>
</xsl:template>

<!-- Absolute links starting with "http" are external and we
     must add icon to them -->
<xsl:template match="a[starts-with(@href,'http')]" 
              mode="decoratelinks">
  <xsl:copy>
    <!-- Copy original <a> attributes -->
    <xsl:apply-templates select="@*" mode="decoratelinks"/>
    <!-- Insert image -->
    <img src="external.gif" width="16" height="16" border="0"/>
    <xsl:text>&#160;</xsl:text>
    <!-- Copy content (subelements and text nodes) of <a> -->
    <xsl:apply-templates select="*|text()" 
                         mode="decoratelinks"/>
  </xsl:copy>
</xsl:template>
  
</xsl:stylesheet>

The Future of the node-set() Function

XSLT 2.0 and XPath 2.0 are slowly progressing toward W3C Recommendation. You might be wondering whether the node-set() function will be part of these standards. The answer is no, but don't worry. The authors of XSLT 2.0 made an important decision: result tree fragments are gone. There will be no need to use the node-set() function in XSLT 2.0 as you can operate directly on XML fragments stored in a variable, as on any other node-set. Regardless, you should put the node-set() function in your bag of tools as it will take several years before XSLT 2.0 will be deployed as widely as XSLT 1.0 is deployed today.



1 to 2 of 2
  1. fiwedding
    2010-06-18 20:06:23 fiwedding
  2. Strange retro article on node-set
    2003-07-19 09:34:59 James Fuller
1 to 2 of 2