Menu

Trees, Temporarily

December 3, 2003

Bob DuCharme

XPath 1.0 has a special data type called Result Tree Fragments. For example, an xsl:variable element can store a single string, but it can also store an XML element with all the descendants and attributes you like. This structure is a Result Tree Fragment. (I try to avoid using the acronym because of my many unpleasant memories of writing awk, Perl and Omnimark scripts to read or write Rich Text Format files). There's little you can do with result tree fragments in XSLT 1.0; you can treat them as strings and you can use xsl:copy-of to copy them to the result tree, and that's it. Because many XSLT developers longed for a way to pass composite structures to named templates, and then use the pieces of those structures individually inside the named template, instead of merely copying the structure to the result tree or pulling substrings out of it, several XSLT 1.0 processors offer extension functions such as Xalan's nodeset() and Saxon's node-set() that convert these fragments to node sets whose nodes can be addressed with XPath expressions.

XSLT 2.0 eliminates result tree fragments and replaces them with a more powerful feature: temporary trees. Once you create a temporary tree in an xsl:variable, xsl:param, or xsl:with-param element, you can do anything with it that you can do with a source tree.

Passing Temporary Trees Around

Our first example shows how, after passing a variable containing a temporary tree to a named template or function, you can do all sorts of things with it that you couldn't do with a result tree fragment passed to an XSLT 1.0 named template. (As usual, for now, the only XSLT processor that implements enough of XSLT 2.0 to try this is Saxon 7.) The stylesheet in the example just copies the source tree to the result tree after adding a header comment with metadata about the stylesheet that created the result. This could be done much more simply, and work just fine using XSLT 1.0, if the stylesheet stored the metadata in a named template and called that template to output the metadata, so don't model any production code on what you see below — it's only doing this with several XSLT 2.0 techniques in order to demonstrate how those techniques work. (In fact, don't model any production code on any XSLT 2.0 stylesheets you see before it becomes a Recommendation, which hasn't happened yet.)

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:my="http://www.snee.com/ns/whatever"
                version="1.0">

  <xsl:variable name="genData">
    <ssheetMetadata>
      <filename>prepdata.xsl</filename>
      <author>BD</author>
      <releaseHist>
        <version date="2003-10-12T14:52" fileSize="1543">1.2</version>
        <version date="2003-09-11T10:12" fileSize="1322">1.1</version>
        <version date="2003-07-24T08:03" fileSize="1134">1.0</version>
      </releaseHist>
    </ssheetMetadata>
  </xsl:variable>

  <xsl:template match="/">
    <xsl:call-template name="outputMetadata">
      <xsl:with-param name="revData" select="$genData"/>
    </xsl:call-template>
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template name="outputMetadata">
    <xsl:param name="revData"/>
    <xsl:comment>
      File name: <xsl:value-of select="$revData/ssheetMetadata/filename"/>
      Revision History:
      <xsl:for-each select="$revData/ssheetMetadata/releaseHist/version">
        release <xsl:value-of select="."/>
        <xsl:text> </xsl:text><xsl:value-of select="@date"/>
      </xsl:for-each>
        average file size: <xsl:value-of select="my:avgFileSize($revData)"/>
    </xsl:comment>
  </xsl:template>

  <xsl:function name="my:avgFileSize">
    <xsl:param name="fileData"/>
    <xsl:value-of 
       select="avg($fileData/ssheetMetadata/releaseHist/version/@fileSize)"/>
  </xsl:function>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

When the first template rule finds the root of the source tree, it calls the outputMetadata named template, passing it a reference to the genData variable for the named template's revData parameter. After doing so, the first template rule's xsl:apply-templates instruction tells the XSLT processor to apply any relevant template rules to the root's children. Because the only other template rule in the stylesheet with a match attribute is the last one, which just copies everything, the source tree will be copied to the result tree after the output of the outputMetadata named template.

The declaration for the genData variable at the start of the stylesheet holds an ssheetMetadata element, which has several children and grandchildren elements. The outputMetadata named template that has genData passed to it by the first template rule uses xsl:value-of instructions to pull various information out of genData to add to the result tree inside of a comment. With the stylesheet shown above, the comment comes out like this:

<!--
      File name: prepdata.xsl
      Revision History:
      
        release 1.2 2003-10-12T14:52
        release 1.1 2003-09-11T10:12
        release 1.0 2003-07-24T08:03
        average file size: 1333-->

The first thing it pulls out is the contents of the ssheetMetadata element's filename child. The XPath expression in the xsl:value-of element's select attribute has a reference to the named template's revData parameter as its first step. The mere existence of more XPath location steps after this one is great news — an XSLT 2.0 processor's ability to treat a passed parameter as a tree, and reach down and grab a specific node of the tree using an XPath expression, adds lots of new possibilities to what we can do in named templates.

The "Revision History" part of the example demonstrates how we can do more than just grab a single node from the temporary tree. An xsl:for-each loop cycles through the version elements in the passed subtree, again specifying a reference to the template's revData parameter as the first step in the XPath expression that identifies the node set to loop through. For each version element that it finds, it adds the word "release", the contents of the version element, a single space, and the contents of the version element's date attribute to the result tree, which creates the three "release" lines lines in the output shown above.

The last thing that that outputMetadata adds to the comment sent to the result tree is the label "average file size:" and the value returned by the my:avgFileSize() function defined below it in the stylesheet. The ability to define and call functions right in the stylesheet is another significant new feature in XSLT 2.0 (see September's column for more on this); the ability to pass arbitrary trees of information as parameters to these functions is not only an improvement over XSLT 1.0, but also an improvement over most other programming languages that let you define and call your own functions.

The body of the my:avgFileSize() function definition illustrates how we can pass temporary trees to built-in functions as well. The new XPath (and XQuery) 2.0 avg() function computes the average of the numeric values in the node set passed to it and the stylesheet's my:avgFile function passes it the set of fileSize attributes in the version grandchild of the ssheetMetadata root element of the temporary tree pass as a parameter.

Using Temporary Trees for Two-Phase Processing

By limiting certain template rules for use in processing in specific modes, you can convert source tree nodes into a temporary tree and then process the temporary tree again before sending anything to the result tree.

Picture this scenario for the following stylesheet: you already have lots of stylesheets to convert CALS-style tables to several different formats, and the architectural policy where you work is to convert incoming tabular data to CALS tables so that you can take advantage of the existing code to then turn this data into other formats. You must write a stylesheet to turn the following document about the chart positions of various bands' singles into HTML:

<chart>
  <header>
    <date>2003-11-24</date>
    <approvals>
      <analyst time="09:42">GF</analyst>
      <editor time="10:03">DC</editor>
    </approvals>
  </header>
  <songs>
    <song>
      <title>Mondegreen Daydream</title>
      <band>The New Wayouts</band>
      <chartPos>4</chartPos>
    </song>
    <song>
      <title>You (and Me)</title>
      <band>Dr. Bellows</band>
      <chartPos>1</chartPos>
    </song>
    <song>
      <title>Fly in My Soup</title>
      <band>King Timahoe</band>
      <chartPos>12</chartPos>
    </song>
  </songs>
</chart>

You want your new stylesheet to convert each song element in this document to a row in a CALS table. Next it must pass that CALS table to the template rules in the following cals2html.xsl stylesheet, which your company's been using to convert CALS tables to HTML tables. Note how the template rules of the cals2html.xsl file all explicitly identify themselves as having a mode value of "CALS2HTML":

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="2.0">

  <xsl:template match="row" mode="CALS2HTML">
    <tr><xsl:apply-templates mode="CALS2HTML"/></tr>
  </xsl:template>
             
  <xsl:template match="table" mode="CALS2HTML">
    <table><xsl:apply-templates mode="CALS2HTML"/></table>
  </xsl:template>
             
  <xsl:template match="entry" mode="CALS2HTML">
    <td><xsl:apply-templates/></td>
  </xsl:template>

</xsl:stylesheet>

(A more complete CALS2HTML stylesheet would be much longer.) The stylesheet below, which is based on the example in the temporary trees section of the "last call" XSLT Working Draft, uses the cals2html.xsl stylesheet to convert the songs input to a CALS table temporary tree and then converts that tree to HTML in the final result tree.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="2.0">

  <xsl:import href="cals2html.xsl"/>

  <xsl:variable name="intermediate">
    <xsl:apply-templates select="/" mode="phase1"/>
  </xsl:variable>

  <xsl:template match="/">  
    <xsl:apply-templates select="$intermediate" mode="CALS2HTML"/>
  </xsl:template>

  <xsl:template match="songs" mode="phase1">
    <table><xsl:apply-templates mode="phase1"/></table>
  </xsl:template>
             
  <xsl:template match="song" mode="phase1">
    <row><xsl:apply-templates mode="phase1"/></row>
  </xsl:template>
             
  <xsl:template match="/" mode="phase1">
    <xsl:apply-templates mode="phase1"/>
  </xsl:template>
             
  <xsl:template match="title|band|chartPos" mode="phase1">
    <entry><xsl:apply-templates/></entry>
  </xsl:template>

</xsl:stylesheet>

The stylesheet's first template rule gets processed when the XSLT engine sees the root of the source tree document. It specifies that relevant templates with a mode value of "CALS2HTML" (which happen to be the template rules in the cals2html.xsl stylesheet named in the xsl:import statement) should be applied. Applied to what? Not to the root's child nodes, which would be the default; they should be applied to the value of the intermediate variable declared above the template rule.

    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

By sending intermediate to the CALS2HTML template rules, this first template rule is clearly implementing the second step of our songs-to-CALS, CALS-to-HTML sequence. The intermediate declaration is where the first step happens: it applies the template rule for which mode equals "phase1" to the root of the source tree. That template rule — the second-to-last one in the stylesheet — calls the other "phase1" template rules in the stylesheet to convert the songs element to a a CALS table. (A CALS table, like an HTML table, calls the main element "table" but then calls each row "row" and each entry "entry," unlike HTML's use of "tr" and "td" for rows and entries.)

When this conversion step finishes, the result is a temporary tree of a CALS table created by the intermediate variable declaration, ready for use by any template rule that uses it. We've already seen that the first template rule uses intermediate to pass to the template rules that convert CALS to HTML, and the result is an HTML version of what began as a group of song elements wrapped in a songs element.

You're certainly not limited to two-phase processing here, but even doing it with only two stages requires a degree of comfort with using modes in XSLT. If they're not set up just right, two different template rules that use xsl:apply-templates on the source tree root can get you stuck in a loop. Saxon 7 broke out of the loop easily enough with a Ctrl-C, so go ahead and play. Temporary trees in XSLT are new enough that you could end up breaking new ground yourself; let me know what you come up with.