Grouping With XSLT 2.0
Relational databases have always offered a feature known as
grouping, that is, sorting a collection of records on a field or
combination of fields and then treating each subcollection that has the
same value in that sort key as a unit. For example, if the following XML
document was stored in a relational database table, grouping the records
by project value would let us print the records with a subhead
for each project name at the beginning of that project's group of records,
and it would let us find statistics such as the average or total size of
the files in each project.
<files>
<file name="swablr.eps" size="4313" project="mars"/>
<file name="batboy.wks" size="424" project="neptune"/>
<file name="potrzebie.dbf" size="1102" project="jupiter"/>
<file name="kwatz.xom" size="43" project="jupiter"/>
<file name="paisley.doc" size="988" project="neptune"/>
<file name="ummagumma.zip" size="2441" project="mars"/>
<file name="schtroumpf.txt" size="389" project="mars"/>
<file name="mondegreen.doc" size="1993" project="neptune"/>
<file name="gadabout.pas" size="685" project="jupiter"/>
</files>
While XSLT 1.0 lets you sort elements (see the July 2002
column for an introduction), it still forces you to jump through several
hoops to do anything extra with the groups that result from the
sort. Oracle's lead XML Technical Evangelist Steve Muench developed an
approach using the xsl:key element, and this became so popular
that it's known as the "Muenchian Method." Jeni Tennison has a fine
explanation of it on her site.
XSLT 2.0 makes grouping even easier than Steve did. The
XSLT 2.0
xsl:for-each-group instruction iterates across a series of
groups, with the criteria for grouping specified by its attributes. The
required select attribute identifies the elements to sort and
group, and either the group-by, group-adjacent,
group-starting-with, or group-ending-with attribute
describes how to sort and group them.
Let's look at a simple example. The single template rule in
the following XSLT 2.0 stylesheet tells the XSLT processor that when it
finds a files element it should select all the file
children of that element and sort them into groups based on the value of
each file element's project attribute value. (All
examples in this column are available in this
zip file. To run them, use Saxon 7, the only XSLT processor
current offering support for 2.0.)
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="text"/>
<xsl:template match="files">
<xsl:for-each-group select="file" group-by="@project">
<xsl:value-of select="current-grouping-key()"/>
<xsl:text>
</xsl:text>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
Just as the XSLT 1.0 xsl:for-each instruction
iterates across a node set, with child elements of the
xsl:for-each element specifying what you want done to each node
in the set, the xsl:for-each-group instructions iterates across
the groups, with children of the xsl:for-each-group element
specifying what you want done to each group. The example above does two
simple things as it finds each group:
current-grouping-key() function, which returns the grouping
key value shared by the members of the group.Using the XML document shown earlier as a source document, the stylesheet creates this result:
mars
neptune
jupiter
It lists the grouping values. This ability to list all the different project values with no repeats in the list may seem simple, but it would have taken a lot more code in XSLT 1.0.
Let's replace the template rule with one that does a bit more:
<xsl:template match="files">
<xsl:for-each-group select="file" group-by="@project">
<xsl:for-each select="current-group()">
<xsl:value-of select="@name"/>, <xsl:value-of select="@size"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>average size for </xsl:text>
<xsl:value-of select="current-grouping-key()"/>
<xsl:text> group: </xsl:text>
<xsl:value-of select="avg(current-group()/@size)"/>
<xsl:text>
</xsl:text>
</xsl:for-each-group>
</xsl:template>
The contents of this xsl:for-each element begin
with an XSLT 1.0 xsl:for-each element which, as I mentioned,
iterates across a set of nodes. By selecting the current-group()
node set, the xsl:for-each element iterates over the nodes of the
"mars" group in the first xsl:for-each-group pass, the nodes of
the "neptune" group in the second pass, and those of the "jupiter" group
in the final pass. Each iteration of the xsl:for-each instruction
outputs the value of the name attribute of the context node (the
node being processed by the loop), a comma, and the value of the context
node's size attribute, finishing with a carriage return added
with an xsl:text element.
After the xsl:for-each element iterates across the
group being processed by the xsl:for-each-group element, the
template outputs a message about the average size value within
each group. To do this, it uses the current-grouping-key()
function that we saw in our first stylesheet to name the group and the
avg() function to compute the average. The argument to the
avg() function is the node set consisting of the size
attribute values of all the nodes in the current group.
Applied to the same source document, this second stylesheet produces this result:
swablr.eps, 4313
ummagumma.zip, 2441
schtroumpf.txt, 389
average size for mars group: 2381
batboy.wks, 424
paisley.doc, 988
mondegreen.doc, 1993
average size for neptune group: 1135
potrzebie.dbf, 1102
kwatz.xom, 43
gadabout.pas, 685
average size for jupiter group: 610
If the xsl:for-each-group element uses a
group-adjacent attribute instead of a group-by
attribute, it doesn't sort the selected elements, leaving them in their
original order and grouping adjacent elements with the same key value
together. For example, if we revise the previous stylesheet's template to
look like this (note also the removal of the instructions that compute
average file sizes),
<xsl:template match="files">
<xsl:for-each-group select="file" group-adjacent="@project">
<xsl:for-each select="current-group()">
<xsl:value-of select="@name"/>, <xsl:value-of select="@size"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:for-each-group>
</xsl:template>
it only groups together the potrzebie.dbf/kwatz.xom pair
and the ummagumma.zip/schtroumpf.txt pair, since those were the only
contiguous file elements in our source documents that had the
same project attribute value—"jupiter" for potrzebie.dbf
and kwatz.xom and "mars" for ummagumma.zip and schtroumpf.txt.
swablr.eps, 4313
batboy.wks, 424
potrzebie.dbf, 1102
kwatz.xom, 43
paisley.doc, 988
ummagumma.zip, 2441
schtroumpf.txt, 389
mondegreen.doc, 1993
gadabout.pas, 685
The group-starting-with attribute names a node
that the xsl:for-each-group element will treat as the beginning
of a new group. This can add depth to a flat list of elements by enclosing
groups of those elements in container elements. HTML documents, in which
h1, h2, h3, and p elements after any
of these headers are usually siblings, can benefit a lot from this; its
flat structure makes it difficult for a stream-based parser to know which
section of a document is ending when, and containing elements make this
much easier. To add some depth to the following HTML document, the
group-starting-with attribute can let us specify that each
h1 element starts a new chapter:
<html><body>
<h1>Loomings</h1>
<p>par 1</p>
<p>par 2</p>
<p>par 3</p>
<h1>The Whiteness of the Whale</h1>
<p>par 4</p>
<p>par 5</p>
<p>par 6</p>
</body>
</html>
The following template rule does this to elements within a
body element by specifying "h1" as the node starting each group
that the XSLT processor should enclose in a chapter element. Note
how the select attribute doesn't specify one kind of element to
group, but all (*) children of the body element:
<xsl:template match="body">
<body>
<xsl:for-each-group select="*" group-starting-with="h1">
<chapter>
<xsl:for-each select="current-group()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:for-each>
</chapter>
</xsl:for-each-group>
</body>
</xsl:template>
Applying it to the HTML document shown above gives us this result:
<html>
<body>
<chapter>
<h1>Loomings</h1>
<p>par 1</p>
<p>par 2</p>
<p>par 3</p>
</chapter>
<chapter>
<h1>The Whiteness of the Whale</h1>
<p>par 4</p>
<p>par 5</p>
<p>par 6</p>
</chapter>
</body>
</html>
The fourth and last way to specify a grouping is the
group-ending-with attribute, which names a pattern that
identifies nodes that should end each group. The following template rule
specifies that a group ends when it finds an element with any name
(*) whose position, modulo 3, equals 0 -- in other words, any
element whose position within its parent is a multiple of 3. The template
rule also encloses the whole result in a book element.
<xsl:template match="files">
<book>
<xsl:for-each-group select="*"
group-ending-with="*[position() mod 3 = 0]">
<chapter>
<xsl:for-each select="current-group()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:for-each>
</chapter>
</xsl:for-each-group>
</book>
</xsl:template>
A stylesheet with this template rule creates this result
when using the files document we saw earlier:
<book>
<chapter>
<file name="swablr.eps" size="4313" project="mars"/>
<file name="batboy.wks" size="424" project="neptune"/>
<file name="potrzebie.dbf" size="1102" project="jupiter"/>
</chapter>
<chapter>
<file name="kwatz.xom" size="43" project="jupiter"/>
<file name="paisley.doc" size="988" project="neptune"/>
<file name="ummagumma.zip" size="2441" project="mars"/>
</chapter>
<chapter>
<file name="schtroumpf.txt" size="389" project="mars"/>
<file name="mondegreen.doc" size="1993" project="neptune"/>
<file name="gadabout.pas" size="685" project="jupiter"/>
</chapter>
</book>
|
Also in Transforming XML | |
The group-by, group-adjacent,
group-starting-with, and group-ending-with attributes
can all name an element as the criterion to determine grouping boundaries;
but, as this last example shows, you can be more creative than that, using
functions and XPath predicates to identify the source tree nodes that
should be treated as group boundaries. The Examples section of the
XSLT 2.0 Working Draft's section on grouping has additional good
demonstrations of what you can do with these attributes to customize the
xsl:for-each-group element's treatment of your documents.
Demonstrating XSLT 2.0's grouping capability is easiest with simple, flat data that would fit easily into a normalized relational table; remember, however, that you can take advantage of it with all kinds of data, as long as you can count on finding the fields and attributes you need where you need them. After all, sometimes the whole point of using XML is that you have data that won't fit easily into normalized tables; it's nice to see more and more of the tricks for manipulating those tables coming to the world of XML development.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.