Dirty XSLT Output
I have observed "dirty output" applying different template rules that look equivalent to me. The source document is:
<world>
<country>
<appellative>
<name>United States of
America</name>
</appellative>
<government>
<executive_branch>
<chief_of_state>
<name>George W.</name>
<surname>Bush</surname>
</chief_of_state>
</executive_branch>
</government>
</country>
</world>
I'm looking for this (partial) result:
...
<table border="1">
<tr>
<td>George W. Bush</td>
</tr>
</table>
...
I can get this result using the following transformation (call it Transformation #1):
<xsl:template match="world">
<html>
<head>
<title>world</title>
</head>
<body>
<table border="1">
<xsl:apply-templates
select="country"/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="country">
<tr>
<td>
<xsl:value-of
select="government/executive_branch/chief_of_state"/>
</td>
</tr>
</xsl:template>
I've also tried this transformation (call it Transformation #2; compare the boldfaced portions with their counterparts above):
<xsl:template match="world">
<html>
<head>
<title>world</title>
</head>
<body>
<table border="1">
<xsl:apply-templates
select="country"/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="country/government">
<tr>
<td>
<xsl:value-of
select="executive_branch/chief_of_state"/>
</td>
</tr>
</xsl:template>
What I actually get from the latter, though, is this:
...
<table border="1">United States of America
<tr>
<td>George W. Bush</td>
</tr>
...
Why, in this case, does "United States of America" come out? I think
the problem is in the xsl:apply-templates element, but I do
not understand why. Can you help me?
A: You're on the right track; the problem is indeed related to the
xsl:apply-templates element. More exactly, it's attributable
to the cumulative effect of that element together with the explicit
or implicit template rules elsewhere in the stylesheet.
By "explicit" template rules, I mean the template rules --
xsl:template elements and their descendants -- which you've
expressly coded in your stylesheet. In Transformation #1, notice that
there's a direct correspondence between the value of the
xsl:apply-templates element's select attribute
(whose value is "country") and the match attribute of an
xsl:template rule.
In your Transformation #2, though, there is no such direct
correspondence. While the select attribute still has the
value "country," there are no match attributes with this
value; the only other template rule matches on "country/government"
instead. What you've run afoul of here is not the template rules you've
explicitly coded, but the built-in template rules established by
the XSLT 1.0
Recommendation. These are the "implicit" template rules.
Built-in template rules are defined for the root node, elements, text, attributes, comments, and processing instructions (PIs). The two built-ins causing the mysterious-seeming discrepancy in Transformation #2 are those for element and text nodes:
<!-- Element nodes with no explicit
template rule -->
<xsl:template match="*">
<xsl:apply-templates />
</xsl:template>
<!-- Text nodes with no explicit template rule
-->
<xsl:template match="text()">
<xsl:value-of select="." />
</xsl:template>
The built-in template rule for element nodes says, "Process all template rules -- including built-in ones, if necessary -- for all children of the context element node." Note that the term "children" covers not only child elements, but also child text, comment, and PI nodes. For a given text node, the built-in template rule simply transfers the string-value of the text node straight to the result tree. In both cases, as with the other built-in template rules, the built-ins are automatically processed by the XSLT processor just as if you'd explicitly coded them in your stylesheet -- unless you explicitly override them, or provide some other constraint which keeps them from kicking in.
|
So now let's trace the logic of your Transformation #2:
world elements in the source
tree.html, head, and so on,
down through a table element.table element, process
all template rules for the country child elements of
this world element.government element which is a child of
a country element. (Note that this does not
correspond to the instruction provided by the first template rule's
xsl:apply-templates element.)tr table-row element, within which will be a single
td element and the string-value of the context node's (that
is, the country element's)
executive_branch/chief_of_state "grandchild."That looks fine so far. The problem is that your source tree includes a
handful of elements (and in one case, a text node) for which there are no
explicit template rules. For all these cases
-- including the country element which you expressly
said (via that xsl:apply-templates) you wanted to be
processed -- the built-in template rules are automatically
applied. For every one of the elements, therefore -- every one,
that is, except world (covered by Transformation
#2's first template rule) and the government child(ren) of
country elements (from Transformation #2's second template rule)
-- every child of that element is processed as if you'd
explicitly coded the built-in template rule. This includes the
text-node child of the nameelement --
that is, the string "United States of America." This text node
gets transferred to the result tree, unchanged, at the point
where the name element would normally be processed.
So that's why "United States of America" shows up in
Transformation #2's result tree where it does.
But what about Transformation #1? Why isn't the name
element (with the same text node child) processed there, too?
Let's trace its logic:
country element. (Note that this
does correspond, exactly, to the instruction provided by the first
template rule's xsl:apply-templates element.)tr table-row element, within which will be a single
td element and the string-value of the context node's (that
is, the country element's)
government/executive_branch/chief_of_state "great-grandchild."Why, then, isn't the country element's name
descendant processed? Because Transformation #1 overrides, for all
country elements, the built-in template rule for elements in
general: no children, and hence no descendants, of country
will be processed at all by further template rules (built-in or
otherwise). If you'd like Transformation #1 to duplicate the results of
Transformation #2, just modify its second template rule as follows:
<xsl:template match="country">
<xsl:apply-templates
select="appellative"/>
<tr>
<td>
<xsl:value-of
select="government/executive_branch/chief_of_state"/>
</td>
</tr>
</xsl:template>
Since there's no explicit template rule provided for processing
appellative elements, the built-in rules take effect --
trickling all the way down to the text node, "United States of America,"
which is a child of the name element.
By the way, I often find these built-in template rules (especially the one for text nodes) mildly distracting, especially when debugging a stylesheet. If you're distracted by them, too, feel free to override them with empty explicit template rules of your own. For instance:
<!-- Suppress all text nodes by default
-->
<xsl:template match="text()" />
Not everyone agrees with this approach, but it puts more explicit control in the hands of the stylesheet developer. And when I'm developing a stylesheet, that's where I prefer the control to be.
|
Also in XML Q&A | |
In the July XML
Q&A column, I discussed using the generate-id()
function, in concert with the key() and
current() functions, to group source-tree data. Shortly after
that column appeared, I received a nice message from Geert Josten. He used
the examples from that column to help test an "XSLT parser" he'd built
using the, as he put it, "very unknown programming language called
MetaMorphosis (www.ovidius.com)."
Josten said he'd fixed a couple of small problems involving use of the
current() and generate-id() functions. But then
he went on to point out some flaws in my explanation of how the
generate-id() function works.
In particular, the column said, "whatever algorithm it uses [for generating IDs, an XSLT processor must generate the same key value for any given 'seed value' (such as a node's string-value) in a given processing instance." But this statement implies that a node's string-value is alone sufficient for generating a unique ID; in fact, if this were true, two attributes (for example) with the same value would have the same ID. This clearly isn't the case.
Thanks to Geert Josten for pointing out the error.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.