Of Grouping, Counting, and Context
John Simpson is the author of XPath and XPointer.
My XML file looks like this:
<Match>
<Date>21/3/2005</Date>
<Stadium>Wembley</Stadium>
<Team Name="Liverpool">
<Goal Scorer="O'Reilly"/>
<Goal Scorer="Smith"/>
<Goal Scorer="O'Reilly"/>
</Team>
<Team Name="Real Madrid">
<Goal Scorer="Charles"/>
<Goal Scorer="Humble"/>
<Goal Scorer="Humble"/>
<Goal Scorer="Santana"/>
<Goal Scorer="Humble"/>
</Team>
</Match>
I want to get output like this (for the Liverpool team):
Player Goals
O'Reilly 2
Smith 1
Any ideas?
A: Great question. The answer requires knowledge of a couple of XSLT
techniques: grouping (with XSLT keys) and using the count()
function.
Here's the relevant portion of an XSLT stylesheet to solve the problem; notes on the code follow, particularly on the portions which are in boldface:
<xsl:key name="player" match="@Scorer"
use="."/>
<xsl:template match="Team">
<table border="1">
<tr><th colspan="3">Team: <xsl:value-of
select="@Name"/></th></tr>
<tr><th>Player</th><th>Goals</th><th>Gen'd
ID</th></tr>
<xsl:for-each
select="Goal/@Scorer[generate-id()=generate-id(key('player',
.))]">
<tr>
<td><xsl:value-of
select="."/></td>
<td><xsl:value-of
select="count(../../Goal[@Scorer=current()])"/></td>
<td><xsl:value-of
select="generate-id(.)"/></td>
</tr>
</xsl:for-each>
</table>
<br />
</xsl:template>
|
Related Reading
XPath and XPointer |
This code fragment first sets up an XSLT key -- something like an
index in DBMS terms. The name of the key is "player"; it matches on a pattern
identified by the XPath expression @Scorer. The value of the key,
given by the use attribute, is the string-value of that match
pattern. Note that the xsl:key element is a top-level element, a
child of the root xsl:stylesheet element. Thus the match pattern
isn't "relative" to anything at all in the source tree -- there's nothing like
a context node to which it can be relative -- until the key is
actually invoked, using the key() function, by some lower-level
stylesheet template or instruction. Importantly, unlike a DBMS unique index or
an XML ID-type attribute, an XSLT key may point to more than one "thing" at a
time. Also unlike ID-type attributes, keys aren't restricted to identifying
elements only. In this case, we're going to be grouping on the basis of
attributes which share the same value: the Scorer
attributes in the source tree.
There's one template rule in this stylesheet fragment. For every occurrence
of a Team element in the source tree, as located by the
xsl:template's match attribute, the template
constructs a three-column table. Two columns display the names of the scoring
players on each team and the number of goals scored; I've added the third
column just to demonstrate how the keying works.
The first thing of interest in the template rule is the
xsl:for-each element within it. Its select attribute
starts clearly enough --"for each Scorer attribute of a
Goal child of the context Team element" -- but then
seems to trail off into gibberish when it moves into the predicate.
|
What that predicate is doing is a form of the so-called Muenchian method
for grouping data in ways not "built-in" to the structure of the source
tree. The name of this technique, by the way, comes from Steve Muench of
Oracle Corporation, who first popularized it on the XSL-List mailing list. It
takes advantage of a couple of features of the XSLT generate-id()
function:
|
Related Reading
XSLT |
Essentially, this predicate says to restrict the node-set located by the
select attribute to just those Scorer attributes
whose generated IDs are identical to that of the current Scorer
attribute. This is where the grouping occurs: instead of getting a separate
table row for every single Scorer attribute (which would yield,
say, two rows for O'Reilly and three for Humble), you get one row for each
unique Scorer attribute value.
The other boldfaced portion of the template rule is where the goals are
counted for each unique Scorer attribute value. The
count() function takes one argument, an XPath expression -- a
relative location path, here. The location path tells the count()
function to count all Goal element children of the current node's
"grandparent" (that's the ../../ portion of the location path),
as long as those Goal elements' Scorer
attributes have the same value as the current Scorer
attribute.
|
Also in XML Q&A | |
By the way, note the use of the current() function in the
predicate. Under most circumstances, the current node is the same as the
context node. Within an xsl:for-each block in particular, though
the two will not necessarily be the same. At the point of the call to the
current() function in that xsl:value-of element's
select attribute, the context node has shifted to the
Goal element selected by the portion of the count()
function's argument which precedes the predicate. Of course this element's
string-value is never equal to that of the Scorer attribute, so
(a) the predicate is never true, (b) no matching nodes are ever selected, and
(c) the number of goals is therefore always zero. The current node,
on the other hand, is unaffected by the xsl:value-of's resetting
of the context node: the current node, in this case, is always the node
established by the current pass through the xsl:for-each loop --
that is, the current (keyed) Scorer attribute value.
The third column in the table simply shows how the
generate-id() function works for each unique
Scorer value. Again, remember the two rules in the preceding list: a
given key must be unique for any given "seed value," and an XSLT processor is
free to use any key-generation algorithm it wants as long as it results in the
same key (a generated ID, in this case) for a given seed value in a given
processing instance. Here's how Microsoft's MSXML processor formats the table
in a typical instance:
If you use the Saxon processor to transform the source tree with the above XSLT code, you might get something like the following instead (results obtained from Saxon 6.2):

Note (in the third column) that the two processors employ quite different rules for generating the IDs, but still produce the same results in the important columns: counts of goals, grouped by scorer. Remove that third column and the two processors (as should all other compliant processors) produce identical results.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.