Using XSLT to Assist Regression Testing

December 4, 2002

Sal Mangano is the author of XSLT Cookbook.

Regression testing is an important software-testing technique in which the output of a program before a change was made is compared to the output after the change in order to determine whether the change introduced bugs. This sort of testing is useful after refactoring code to improve its structure or performance, but it does not alter its behavior. It is also useful when new features are added to software, as a way to test whether or not the old features have been affected.

Recently, colleagues of mine asked if I knew of a tool that could help them regression-test some code which outputs XML. The problem was, they explained, that their changes may affect the order of elements. However, for their application, the order did not matter as long as the hierarchical structure and element content remained the same. Example 1 demonstrates just such a case.

Example 1: Equivalent documents ignoring order

<doc>                    |          <doc>
 <a>10</a>               |            <b>17</b>
 <b>17</b>               |            <a>10</a>
</doc>                   |          </doc>

I did not know of a tool that did this sort of comparison explicitly; however, I quickly told them that they did not need such a tool. "All you need to do is to normalize the output XML using a tiny bit of XSLT," I said. "Then you can simply use a standard Unix diff to check for differences."

Before exploring the solution motivated by my colleague's problem, it's worth investigating solutions to other common normalization problems. A simple solution applies when the only expected differences between two XML documents are whitespace differences. That is, when you need to normalize whitespace between elements. Example 2 shows two documents which are equivalent, irrespective of whitespace. Example 3 shows a simple stylesheet for normalizing such documents. The idea is to copy the input to the output, stripping all of the documents original whitespace-only nodes and inserting new whitespace using indent="yes".

Example 2: Equivalent documents ignoring whitespace

<doc>                     | <doc>
<a>10</a> <b>17</b>       |   <a>10</a>
</doc>                    | 
                          |   <b>17</b>
                          | </doc>

Example 3: Normalizing whitespace-only nodes

<xsl:stylesheet 
  version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  
  <!--Normalize whitespace by stripping space and and indenting -->
  <xsl:output method="xml" version="1.0" indent="yes"/>
  <xsl:strip-space elements="*"/>	
  
  <xsl:template match="/">
    <xsl:copy-of select="."/>
  </xsl:template>
    	
</xsl:stylesheet>

Although the solution in Example 3 efficiently performs the desired normalization, it's worth mentioning an alternative implementation using the identity transformation, as shown in Example 4.

Example 4: An alternate whitespace normalizer using the identity transformation

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  
  <!--Normalize whitespace by stripping space and and indenting -->
  <xsl:output method="xml" version="1.0" indent="yes"/>
  <xsl:strip-space elements="*"/>	
 
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
    	
</xsl:stylesheet>

The identity transformation is one of the most useful XSLT idioms. Why would a transformation that simply copies its input to its output be so useful? Because of XSLT's ability to override template rules via xsl:import. Consider the stylesheet in Example 5, which imports the stylesheet in Example 4.

Example 5: Extending the functionality of Example 4 using xsl:import

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:import href="Figure4.xslt"/>

  <xsl:param name="ignore"/>

  <xsl:variable name="ignore2" 
       select="concat(normalize-space($ignore),',')"/>
  
  <xsl:template match="*">
    <xsl:if test="not(contains($ignore2,concat(name(),',')))">
      <xsl:apply-imports/>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

This stylesheet extends the functionality of the space-normalizing stylesheet to include the ability to strip certain elements from the input document. This is useful for comparing two documents that are identical, when both whitespace and certain specified elements are ignored. The elements are specified in a parameter as a comma-separated list. We normalize the list for easy element-membership checking. If we detect an element node that is not in the list, we copy it by invoking the template in the imported stylesheet using xsl:apply-imports. In the XSLT Cookbook, I explore several ways to exploit the identity transformation.

Turning back to the original problem, to solve it we have to transform documents, whose elements may come in any order, into some normalized form for the purpose of comparison. One obvious normalization technique is to sort the elements by their names within each level of the document hierarchy. And we want to retain the whitespace normalization features of the solution in Example 3, which leads to the XSLT in Example 6.

Example 6: A simple normalizer using xsl:sort

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  
  <!--Normalize whitespace by stripping space and and indenting -->
  <xsl:output method="xml" version="1.0" indent="yes"/>
  <xsl:strip-space elements="*"/>	
  
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates>
        <xsl:sort select="name()"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
    	
</xsl:stylesheet>

This solution is adequate for some cases, but we can generalize it further. The first improvement we can make is to address the case of duplicate elements occurring at some level in the hierarchy, as shown in Example 7. These can be addressed by extending the sort criteria to include the element content, as shown in Example 8.

Example 7: Documents that will not normalize correctly due to duplicate element names

<doc>                    |          <doc>
 <a>10</a>               |            <a>17</a>
 <a>17</a>               |            <a>10</a>
</doc>                   |          </doc>

Example 8: Normalization via sort using node name and node content

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  
  <!--Normalize whitespace by stripping space and and indenting -->
  <xsl:output method="xml" version="1.0" indent="yes"/>
  <xsl:strip-space elements="*"/>	
  
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates>
        <xsl:sort select="name()"/>
        <xsl:sort select="."/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
    	
</xsl:stylesheet>

Alas, Example 8 is not a completely general solution. Can you think of a counter example that will not necessarily normalize correctly? Example 9 provides one.

Example 9: A counter example that won't compare under our sort based normalization strategy

<doc>                     | <doc>
  <a>                     |   <a>               
    <b/>                  |     <c/>
  </a>                    |   </a>
  <a>                     |   <a> 
    <c/>                  |     <b/>
  </a>                    |   </a>
</doc>                    | </doc>

The problem is that if duplicate elements contain structure, then our sort will not necessarily succeed in placing them in a normalized order. Another problem is that we have not considered the presence of attributes. Both of these problems can be overcome at the cost of added complexity. Space does not permit me to explore this topic further here; however, I will revisit this topic in a future article for XML.com. As it turns out, the solution in Example 8 works just fine for many cases, including the ones of interest to my colleagues. However, it would be useful to add the capability to ignore specific elements that we introduced in Example 5. In Example 10, we use the same strategy of importing and overriding the template rule for element nodes.

Example 10: Importing and overriding the template rule for element nodes

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  
  <xsl:import href="Figure8.xslt"/>
  
  <xsl:param name="ignore"/>

  <xsl:variable name="ignore2" 
       select="concat(normalize-space($ignore),',')"/>
  
  <xsl:template match="*">
    <xsl:if test="not(contains($ignore2,concat(name(),',')))">
      <xsl:apply-imports/>
    </xsl:if>
  </xsl:template>
  
</xsl:stylesheet>

This article discussed some important XSLT features and techniques that extend beyond the immediate problem of normalizing documents for comparison. These include the handling of document whitespace, the use of the overriding, the identity transform, and the use of xsl:sort.

There are many excellent resources, both online and in print, which will enhance your understanding of these facilities. I provide a few of them below. And if you've enjoyed the approach taken in this article, then you may also enjoy my XSLT Cookbook.

Recommended resources:

XSL FAQ.
Robert DuCharme's Transforming XML column.
XSLT, by Doug Tidwell (O'Reilly, 2002).
XSLT and XPath on the Edge, by Jeni Tennison (MT Books, 2001).

O'Reilly & Associates will soon release (December 2002) XSLT Cookbook.

For more information, or to order the book, click here.