Menu

Dirty XSLT Output

September 25, 2002

John E. Simpson

Why am I getting dirty output?

I have observed "dirty output" applying different template rules that look equivalent to me. The source document is:

   <world>

      <country>

         <appellative>

            <name>United States of

America</name>

         </appellative>

         <government>

            <executive_branch>

               <chief_of_state>

                  <name>George W.</name>

                  <surname>Bush</surname>

               </chief_of_state>

            </executive_branch>

         </government>

      </country>

   </world>

I'm looking for this (partial) result:

   ...

   <table border="1">

      <tr>

         <td>George W. Bush</td>

      </tr>

   </table>

    ...

I can get this result using the following transformation (call it Transformation #1):

   <xsl:template match="world">

      <html>

         <head>

            <title>world</title>

         </head>

         <body>

            <table border="1">

               <xsl:apply-templates

select="country"/>

            </table>

         </body>

      </html>

   </xsl:template>



   <xsl:template match="country">

      <tr>

         <td> 

            <xsl:value-of

select="government/executive_branch/chief_of_state"/>

         </td>

      </tr>

   </xsl:template>

I've also tried this transformation (call it Transformation #2; compare the boldfaced portions with their counterparts above):

   <xsl:template match="world">

      <html>

         <head>

            <title>world</title>

         </head>

         <body>

            <table border="1">

               <xsl:apply-templates

select="country"/>

            </table>

         </body>

      </html>

   </xsl:template>



   <xsl:template match="country/government">

      <tr>

         <td> 

            <xsl:value-of

select="executive_branch/chief_of_state"/>

         </td>

      </tr>

   </xsl:template>

What I actually get from the latter, though, is this:

   ...

   <table border="1">United States of America

      <tr>

         <td>George W. Bush</td>

      </tr>

    ...

Why, in this case, does "United States of America" come out? I think the problem is in the xsl:apply-templates element, but I do not understand why. Can you help me?

A: You're on the right track; the problem is indeed related to the xsl:apply-templates element. More exactly, it's attributable to the cumulative effect of that element together with the explicit or implicit template rules elsewhere in the stylesheet.

By "explicit" template rules, I mean the template rules -- xsl:template elements and their descendants -- which you've expressly coded in your stylesheet. In Transformation #1, notice that there's a direct correspondence between the value of the xsl:apply-templates element's select attribute (whose value is "country") and the match attribute of an xsl:template rule.

In your Transformation #2, though, there is no such direct correspondence. While the select attribute still has the value "country," there are no match attributes with this value; the only other template rule matches on "country/government" instead. What you've run afoul of here is not the template rules you've explicitly coded, but the built-in template rules established by the XSLT 1.0 Recommendation. These are the "implicit" template rules.

Built-in template rules are defined for the root node, elements, text, attributes, comments, and processing instructions (PIs). The two built-ins causing the mysterious-seeming discrepancy in Transformation #2 are those for element and text nodes:

   <!-- Element nodes with no explicit

template rule -->

   <xsl:template match="*">

      <xsl:apply-templates />

   </xsl:template>



   <!-- Text nodes with no explicit template rule

-->

   <xsl:template match="text()">

      <xsl:value-of select="." />

   </xsl:template>

The built-in template rule for element nodes says, "Process all template rules -- including built-in ones, if necessary -- for all children of the context element node." Note that the term "children" covers not only child elements, but also child text, comment, and PI nodes. For a given text node, the built-in template rule simply transfers the string-value of the text node straight to the result tree. In both cases, as with the other built-in template rules, the built-ins are automatically processed by the XSLT processor just as if you'd explicitly coded them in your stylesheet -- unless you explicitly override them, or provide some other constraint which keeps them from kicking in.

So now let's trace the logic of your Transformation #2:

  1. First template rule:
    • Match on any world elements in the source tree.
    • If you get a match, instantiate in the result tree the basic structure of an XHTML document: html, head, and so on, down through a table element.
    • Within the result tree's table element, process all template rules for the country child elements of this world element.
  2. Second template rule:
    • Match on any government element which is a child of a country element. (Note that this does not correspond to the instruction provided by the first template rule's xsl:apply-templates element.)
    • If you get such a match, instantiate in the result tree a tr table-row element, within which will be a single td element and the string-value of the context node's (that is, the country element's) executive_branch/chief_of_state "grandchild."

That looks fine so far. The problem is that your source tree includes a handful of elements (and in one case, a text node) for which there are no explicit template rules. For all these cases -- including the country element which you expressly said (via that xsl:apply-templates) you wanted to be processed -- the built-in template rules are automatically applied. For every one of the elements, therefore -- every one, that is, except world (covered by Transformation #2's first template rule) and the government child(ren) of country elements (from Transformation #2's second template rule) -- every child of that element is processed as if you'd explicitly coded the built-in template rule. This includes the text-node child of the nameelement -- that is, the string "United States of America." This text node gets transferred to the result tree, unchanged, at the point where the name element would normally be processed. So that's why "United States of America" shows up in Transformation #2's result tree where it does.

But what about Transformation #1? Why isn't the name element (with the same text node child) processed there, too?

Let's trace its logic:

  1. First template rule: same as for Transformation #2.
  2. Second template rule:
    • Match on any country element. (Note that this does correspond, exactly, to the instruction provided by the first template rule's xsl:apply-templates element.)
    • If you get such a match, instantiate in the result tree a tr table-row element, within which will be a single td element and the string-value of the context node's (that is, the country element's) government/executive_branch/chief_of_state "great-grandchild."

Why, then, isn't the country element's name descendant processed? Because Transformation #1 overrides, for all country elements, the built-in template rule for elements in general: no children, and hence no descendants, of country will be processed at all by further template rules (built-in or otherwise). If you'd like Transformation #1 to duplicate the results of Transformation #2, just modify its second template rule as follows:

   <xsl:template match="country">

      <xsl:apply-templates

select="appellative"/>

      <tr>

         <td> 

            <xsl:value-of

select="government/executive_branch/chief_of_state"/>

         </td>

      </tr>

   </xsl:template>

Since there's no explicit template rule provided for processing appellative elements, the built-in rules take effect -- trickling all the way down to the text node, "United States of America," which is a child of the name element.

By the way, I often find these built-in template rules (especially the one for text nodes) mildly distracting, especially when debugging a stylesheet. If you're distracted by them, too, feel free to override them with empty explicit template rules of your own. For instance:

   <!-- Suppress all text nodes by default

-->

   <xsl:template match="text()" />

Not everyone agrees with this approach, but it puts more explicit control in the hands of the stylesheet developer. And when I'm developing a stylesheet, that's where I prefer the control to be.

Correction: grouping with XSLT

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

In the July XML Q&A column, I discussed using the generate-id() function, in concert with the key() and current() functions, to group source-tree data. Shortly after that column appeared, I received a nice message from Geert Josten. He used the examples from that column to help test an "XSLT parser" he'd built using the, as he put it, "very unknown programming language called MetaMorphosis (www.ovidius.com)."

Josten said he'd fixed a couple of small problems involving use of the current() and generate-id() functions. But then he went on to point out some flaws in my explanation of how the generate-id() function works.

In particular, the column said, "whatever algorithm it uses [for generating IDs, an XSLT processor must generate the same key value for any given 'seed value' (such as a node's string-value) in a given processing instance." But this statement implies that a node's string-value is alone sufficient for generating a unique ID; in fact, if this were true, two attributes (for example) with the same value would have the same ID. This clearly isn't the case.

Thanks to Geert Josten for pointing out the error.