Listen Print Discuss
Grouping With XSLT 2.0

Grouping With XSLT 2.0

by Bob DuCharme
November 05, 2003

Relational databases have always offered a feature known as grouping, that is, sorting a collection of records on a field or combination of fields and then treating each subcollection that has the same value in that sort key as a unit. For example, if the following XML document was stored in a relational database table, grouping the records by project value would let us print the records with a subhead for each project name at the beginning of that project's group of records, and it would let us find statistics such as the average or total size of the files in each project.

<files>
  <file name="swablr.eps"     size="4313" project="mars"/>
  <file name="batboy.wks"     size="424"  project="neptune"/>
  <file name="potrzebie.dbf"  size="1102" project="jupiter"/>
  <file name="kwatz.xom"      size="43"   project="jupiter"/>
  <file name="paisley.doc"    size="988"  project="neptune"/>
  <file name="ummagumma.zip"  size="2441" project="mars"/>
  <file name="schtroumpf.txt" size="389"  project="mars"/>
  <file name="mondegreen.doc" size="1993" project="neptune"/>
  <file name="gadabout.pas"   size="685"  project="jupiter"/>
</files>

Related Reading

Learning XSLT

Learning XSLT
By Michael Fitzgerald

Table of Contents
Index
Sample Chapter

Read Online--Safari Search this book on Safari:
 

Code Fragments only

While XSLT 1.0 lets you sort elements (see the July 2002 column for an introduction), it still forces you to jump through several hoops to do anything extra with the groups that result from the sort. Oracle's lead XML Technical Evangelist Steve Muench developed an approach using the xsl:key element, and this became so popular that it's known as the "Muenchian Method." Jeni Tennison has a fine explanation of it on her site.

XSLT 2.0 makes grouping even easier than Steve did. The XSLT 2.0 xsl:for-each-group instruction iterates across a series of groups, with the criteria for grouping specified by its attributes. The required select attribute identifies the elements to sort and group, and either the group-by, group-adjacent, group-starting-with, or group-ending-with attribute describes how to sort and group them.

Let's look at a simple example. The single template rule in the following XSLT 2.0 stylesheet tells the XSLT processor that when it finds a files element it should select all the file children of that element and sort them into groups based on the value of each file element's project attribute value. (All examples in this column are available in this zip file. To run them, use Saxon 7, the only XSLT processor current offering support for 2.0.)

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="2.0">

  <xsl:output method="text"/>

  <xsl:template match="files">

    <xsl:for-each-group select="file" group-by="@project">
      <xsl:value-of select="current-grouping-key()"/>
      <xsl:text>
</xsl:text>
    </xsl:for-each-group>

  </xsl:template>

</xsl:stylesheet>

Just as the XSLT 1.0 xsl:for-each instruction iterates across a node set, with child elements of the xsl:for-each element specifying what you want done to each node in the set, the xsl:for-each-group instructions iterates across the groups, with children of the xsl:for-each-group element specifying what you want done to each group. The example above does two simple things as it finds each group:

  • It outputs the value of the current-grouping-key() function, which returns the grouping key value shared by the members of the group.
  • It outputs a carriage return.

Using the XML document shown earlier as a source document, the stylesheet creates this result:

mars
neptune
jupiter

It lists the grouping values. This ability to list all the different project values with no repeats in the list may seem simple, but it would have taken a lot more code in XSLT 1.0.

Let's replace the template rule with one that does a bit more:

<xsl:template match="files">

  <xsl:for-each-group select="file" group-by="@project">
    
    <xsl:for-each select="current-group()">
      <xsl:value-of select="@name"/>, <xsl:value-of select="@size"/>
      <xsl:text>
</xsl:text>
    </xsl:for-each>

    <xsl:text>average size for </xsl:text>
    <xsl:value-of select="current-grouping-key()"/>
    <xsl:text> group: </xsl:text>
    <xsl:value-of select="avg(current-group()/@size)"/>
    <xsl:text>

</xsl:text>
  </xsl:for-each-group>

</xsl:template>

The contents of this xsl:for-each element begin with an XSLT 1.0 xsl:for-each element which, as I mentioned, iterates across a set of nodes. By selecting the current-group() node set, the xsl:for-each element iterates over the nodes of the "mars" group in the first xsl:for-each-group pass, the nodes of the "neptune" group in the second pass, and those of the "jupiter" group in the final pass. Each iteration of the xsl:for-each instruction outputs the value of the name attribute of the context node (the node being processed by the loop), a comma, and the value of the context node's size attribute, finishing with a carriage return added with an xsl:text element.

After the xsl:for-each element iterates across the group being processed by the xsl:for-each-group element, the template outputs a message about the average size value within each group. To do this, it uses the current-grouping-key() function that we saw in our first stylesheet to name the group and the avg() function to compute the average. The argument to the avg() function is the node set consisting of the size attribute values of all the nodes in the current group.

Applied to the same source document, this second stylesheet produces this result:

swablr.eps, 4313
ummagumma.zip, 2441
schtroumpf.txt, 389
average size for mars group: 2381

batboy.wks, 424
paisley.doc, 988
mondegreen.doc, 1993
average size for neptune group: 1135

potrzebie.dbf, 1102
kwatz.xom, 43
gadabout.pas, 685
average size for jupiter group: 610

If the xsl:for-each-group element uses a group-adjacent attribute instead of a group-by attribute, it doesn't sort the selected elements, leaving them in their original order and grouping adjacent elements with the same key value together. For example, if we revise the previous stylesheet's template to look like this (note also the removal of the instructions that compute average file sizes),

<xsl:template match="files">

  <xsl:for-each-group select="file" group-adjacent="@project">
    
    <xsl:for-each select="current-group()">
      <xsl:value-of select="@name"/>, <xsl:value-of select="@size"/>
      <xsl:text>
</xsl:text>
    </xsl:for-each>

    <xsl:text>
</xsl:text>
  </xsl:for-each-group>

</xsl:template>

it only groups together the potrzebie.dbf/kwatz.xom pair and the ummagumma.zip/schtroumpf.txt pair, since those were the only contiguous file elements in our source documents that had the same project attribute value—"jupiter" for potrzebie.dbf and kwatz.xom and "mars" for ummagumma.zip and schtroumpf.txt.

swablr.eps, 4313

batboy.wks, 424

potrzebie.dbf, 1102
kwatz.xom, 43

paisley.doc, 988

ummagumma.zip, 2441
schtroumpf.txt, 389

mondegreen.doc, 1993

gadabout.pas, 685

The group-starting-with attribute names a node that the xsl:for-each-group element will treat as the beginning of a new group. This can add depth to a flat list of elements by enclosing groups of those elements in container elements. HTML documents, in which h1, h2, h3, and p elements after any of these headers are usually siblings, can benefit a lot from this; its flat structure makes it difficult for a stream-based parser to know which section of a document is ending when, and containing elements make this much easier. To add some depth to the following HTML document, the group-starting-with attribute can let us specify that each h1 element starts a new chapter:

<html><body>
<h1>Loomings</h1>
<p>par 1</p>
<p>par 2</p>
<p>par 3</p>
<h1>The Whiteness of the Whale</h1>
<p>par 4</p>
<p>par 5</p>
<p>par 6</p>
</body>
</html>

The following template rule does this to elements within a body element by specifying "h1" as the node starting each group that the XSLT processor should enclose in a chapter element. Note how the select attribute doesn't specify one kind of element to group, but all (*) children of the body element:

<xsl:template match="body">
  <body>
    <xsl:for-each-group select="*" group-starting-with="h1">
      <chapter>
      <xsl:for-each select="current-group()">
        <xsl:copy>
          <xsl:apply-templates/>
        </xsl:copy>
      </xsl:for-each>
      </chapter>
    </xsl:for-each-group>
  </body>
</xsl:template>

Applying it to the HTML document shown above gives us this result:

<html>

   <body>
      <chapter>
         <h1>Loomings</h1>
         <p>par 1</p>
         <p>par 2</p>
         <p>par 3</p>
      </chapter>
      <chapter>
         <h1>The Whiteness of the Whale</h1>
         <p>par 4</p>
         <p>par 5</p>
         <p>par 6</p>
      </chapter>
   </body>
</html>

The fourth and last way to specify a grouping is the group-ending-with attribute, which names a pattern that identifies nodes that should end each group. The following template rule specifies that a group ends when it finds an element with any name (*) whose position, modulo 3, equals 0 -- in other words, any element whose position within its parent is a multiple of 3. The template rule also encloses the whole result in a book element.


<xsl:template match="files">
  <book>
    <xsl:for-each-group select="*"
               group-ending-with="*[position() mod 3 = 0]">
      <chapter>
        <xsl:for-each select="current-group()">
          <xsl:copy>
            <xsl:apply-templates  select="@*|node()"/>
          </xsl:copy>
        </xsl:for-each>
      </chapter>
    </xsl:for-each-group>
  </book>
</xsl:template>

A stylesheet with this template rule creates this result when using the files document we saw earlier:

<book>
   <chapter>
      <file name="swablr.eps" size="4313" project="mars"/>
      <file name="batboy.wks" size="424" project="neptune"/>
      <file name="potrzebie.dbf" size="1102" project="jupiter"/>
   </chapter>
   <chapter>
      <file name="kwatz.xom" size="43" project="jupiter"/>
      <file name="paisley.doc" size="988" project="neptune"/>
      <file name="ummagumma.zip" size="2441" project="mars"/>
   </chapter>
   <chapter>
      <file name="schtroumpf.txt" size="389" project="mars"/>
      <file name="mondegreen.doc" size="1993" project="neptune"/>
      <file name="gadabout.pas" size="685" project="jupiter"/>
   </chapter>
</book>
    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

The group-by, group-adjacent, group-starting-with, and group-ending-with attributes can all name an element as the criterion to determine grouping boundaries; but, as this last example shows, you can be more creative than that, using functions and XPath predicates to identify the source tree nodes that should be treated as group boundaries. The Examples section of the XSLT 2.0 Working Draft's section on grouping has additional good demonstrations of what you can do with these attributes to customize the xsl:for-each-group element's treatment of your documents.

Demonstrating XSLT 2.0's grouping capability is easiest with simple, flat data that would fit easily into a normalized relational table; remember, however, that you can take advantage of it with all kinds of data, as long as you can count on finding the fields and attributes you need where you need them. After all, sometimes the whole point of using XML is that you have data that won't fit easily into normalized tables; it's nice to see more and more of the tricks for manipulating those tables coming to the world of XML development.


Comment on this articleHow are you using grouping in XSLT 2? Share your experience in our forum.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • Group-starting-with problem in XSLT 2.0
    2005-04-21 03:38:06 dearlk [Reply]

    Hi folks!


    I am using XSLT 2.0 with Altova XML Spy 2005 professional edition; which have the inbuilt processor for XSLT 2.0; to transform one XML file into other XML file.


    I've one XML file that contains a large amount of data with lots of tags but without nesting/hierarchies.


    For example:


    ---------------------------------------------------------------------------------------------------


    <book>


    <Story>



    <h1>chapter1</h1>


    <h3>article1</h3>


    <text>text text</text>


    <h2>heading1</h2>


    <text>text text</text>


    <h3>article1</h3>


    <text>text text</text>


    <h2>heading2</h2>


    <text>text text</text>


    <h3>article1</h3>


    <text>text text</text>



    <h1>chapter2</h1>


    <h2>heading1</h2>


    <text>text text</text>


    <h3>article1</h3>


    <text>text text</text>



    </Story>


    </book>



    And XSLT file is like this:


    ------------------------------------------------------------------------------------------------------------


    <?xml version="1.0" encoding="UTF-8"?>


    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2004/07/xpath-functions" xmlns:xdt="http://www.w3.org/2005/02/xpath-datatypes">



    <xsl:template match="book">


    <book>


    <xsl:apply-templates />


    </book>


    </xsl:template>



    <xsl:template match="Story">


    <xsl:for-each-group select="node()" group-starting-with="h1">


    <chapter>


    <xsl:call-template name="h2">


    <xsl:with-param name="child.elements" select="current-group()"></xsl:with-param>


    </xsl:call-template>


    </chapter>


    </xsl:for-each-group>


    </xsl:template>



    <xsl:template name="h2">


    <xsl:param name="child.elements"></xsl:param>


    <xsl:for-each-group select="$child.elements" group-starting-with="h2">


    <xsl:choose>


    <xsl:when test="current-group()[1][self::h1]">


    <xsl:apply-templates select="current-group()"/>


    </xsl:when>


    <xsl:otherwise>


    <heading>


    <xsl:call-template name="h3">


    <xsl:with-param name="child.elements" select="current-group()"></xsl:with-param>


    </xsl:call-template>


    </heading>


    </xsl:otherwise>


    </xsl:choose>


    </xsl:for-each-group>


    </xsl:template>



    <xsl:template name="h3">


    <xsl:param name="child.elements"></xsl:param>


    <xsl:for-each-group select="$child.elements" group-starting-with="h3">


    <xsl:choose>


    <xsl:when test="current-group()[1][self::h1 or self::h2]">


    <xsl:apply-templates select="current-group()"/>


    </xsl:when>


    <xsl:otherwise>


    <article>


    <xsl:apply-templates select="current-group()"/>


    </article>


    </xsl:otherwise>


    </xsl:choose>


    </xsl:for-each-group>


    </xsl:template>


    <xsl:template match="h1 | h2 | h3">


    <title><xsl:apply-templates /></title>


    </xsl:template>


    <xsl:template match="*">


    <xsl:copy>


    <xsl:apply-templates/>


    </xsl:copy>


    </xsl:template>


    </xsl:stylesheet>



    If the highlighted <h3>article1</h3> line is not in there in source XML file, then it gives the absolutely correct result as I want.



    For example: (result WITHOUT <h3>article1</h3> line in source XML file)


    -----------------------------------------------------------------------------------------------


    <?xml version="1.0" encoding="UTF-8"?>


    <book xmlns:fn="http://www.w3.org/2004/07/xpath-functions" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xdt="http://www.w3.org/2005/02/xpath-datatypes" xmlns:xs="http://www.w3.org/2001/XMLSchema">


    <chapter>


    <title>chapter1</title>


    <text>text text</text>


    <heading>


    <title>heading1</title>


    <text>text text</text>


    <article>


    <title>article1</title>


    <text>text text</text>


    </article>


    </heading>


    <heading>


    <title>heading2</title>


    <text>text text</text>


    <article>


    <title>article1</title>


    <text>text text</text>


    </article>


    </heading>


    </chapter>


    <chapter>


    <title>chapter2</title>


    <heading>


    <title>heading1</title>


    <text>text text</text>


    <article>


    <title>article1</title>


    <text>text text</text>


    </article>


    </heading>


    </chapter>


    </book>



    But if that <h3>article1</h3> line is there in XML file as shown above, it gives me wrong result…



    For example : (result WITH <h3>article1</h3> line in source XML file)


    -------------------------------------------------------------------------------------------------------


    <?xml version="1.0" encoding="UTF-8"?>


    <book xmlns:fn="http://www.w3.org/2004/07/xpath-functions" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xdt="http://www.w3.org/2005/02/xpath-datatypes" xmlns:xs="http://www.w3.org/2001/XMLSchema">


    <chapter>


    <title>chapter1</title>


    <title>article1</title>


    <text>text text</text>


    <heading>


    <title>heading1</title>


    <text>text text</text>


    <article>


    <title>article1</title>


    <text>text text</text>


    </article>


    </heading>


    <heading>


    <title>heading2</title>


    <text>text text</text>


    <article>


    <title>article1</title>


    <text>text text</text>


    </article>


    </heading>


    </chapter>


    <chapter>


    <title>chapter2</title>


    <heading>


    <title>heading1</title>


    <text>text text</text>


    <article>


    <title>article1</title>


    <text>text text</text>


    </article>


    </heading>


    </chapter>


    </book>



    It means, if all my source elements come in proper order it works fine, but if any one of them is incorrectly or knowingly places in the source XML file… its gives the in-correct result.



    Is there anybody who can help me out?



    Best,


    Lalit


    ------------------------


    Impossible is nothing!




  • Issue in grouping with XSLT 2.0
    2005-03-23 01:17:11 dearlk [Reply]

    I am trying to group the content in my xml file as shown below using XSLT 2.0 but not getting the desired output also shown below.


    Any help from anybody would be great for me.


    Thank you.


    ----------------------------------------
    Source XML
    +++++++++++++++++++++++++
    <?xml version="1.0" encoding="UTF-8"?>
    <Root>
    <Story>
    <chaptertitle>Chapter name</chaptertitle>
    <para>This is a para</para>
    <para>This is a para</para>
    <sectiontitle>Section name</sectiontitle>
    <para>This is a para</para>
    <para>This is a para</para>
    <subsectiontitle>Subsection name</subsectiontitle>
    <para>This is a para</para>
    <articletitle>Article name</articletitle>
    <para>This is a para</para>
    <para>This is a para</para>
    <articletitle>Article name</articletitle>
    <para>This is a para</para>
    <sectiontitle>Section name</sectiontitle>
    <para>This is a para</para>
    <para>This is a para</para>
    </Story>
    </Root>


    +++++++++++++++++++++++++++++
    Desired Output
    ++++++++++++++++++++++++++++++


    <?xml version="1.0" encoding="UTF-8"?>


    <chapter>


    <chaptertitle>Chapter name</chaptertitle>


    <para>This is a para</para>


    <para>This is a para</para>


    <section>


    <sectiontitle>Section name</sectiontitle>


    <para>This is a para</para>


    <para>This is a para</para>


    <subsection>


    <subsectiontitle>Subsection name</subsectiontitle>


    <para>This is a para</para>


    <article>


    <articletitle>Article name</articletitle>


    <para>This is a para</para>


    <para>This is a para</para>


    </article>


    <article>


    <articletitle>Article name</articletitle>


    <para>This is a para</para>


    </article>


    </subsection>


    </section>


    <section>


    <sectiontitle>Section name</sectiontitle>


    <para>This is a para</para>


    <para>This is a para</para>


    </section>


    </chapter>





    • Issue in grouping with XSLT 2.0
      2005-03-23 05:17:58 Bob DuCharme [Reply]

      In general, the XSL-list at http://www.mulberrytech.com/xsl/xsl-list/ is the best place to get answers about problems with specific stylesheets and source files. Many people, including leading XSLT experts, will see your questions very quickly there.

  • Last template example is incorrect
    2003-11-09 13:23:33 TT Online [Reply]

    The last template (the group-ending-with example) will not copy the file elements correctly. It will just concatenate all attribute values of the 'file' element.


    I assume the author meant to use xsl:copy-of here.




    • Last template example is incorrect
      2003-11-09 14:36:03 Bob DuCharme [Reply]


      I realized that my full example for the last one had this template rule in it, which makes it work as-is:


      <xsl:template match="@*|node()">
      <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
      </xsl:template>


      There was also no output="text" at the top.


      I try to only show the code that matters in an example, but I was apparently slashing too much here. You can see it work in the zip file of examples as grouping5.xsl. Until the link from the article to the zip file is fixed, you can get a copy at www.snee.com/bob/temp/trxml42code.zip.


      Bob


  • Link to zip file is broken
    2003-11-07 22:10:05 Larry Seltzer [Reply]

    On http://www.xml.com/pub/a/2003/11/05/tr.html, there's a link to a zip file containing example code. The link is apparently not valid.


    Thanks.