XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Comparing and Replacing Strings

Comparing and Replacing Strings

June 05, 2002

In last month's column we looked at XSLT techniques for splitting up strings of text, for checking whether strings had certain substrings, and for normalizing white space out of an element. This month we'll learn more ways to gain control over strings in your source document, as we see how to compare strings for equality and what kind of search-and-replace operations are possible in XSLT.

To see if two elements are the same, XSLT compares their string values using the equals sign ("="). To demonstrate several variations on this, our next stylesheet compares the a element in the following with its sibling elements. (All stylesheets, input documents, and output documents shown in this article are in this zip file.)

<poem>
  <a>full of Pomp and Gold</a>
  <b>full of Pomp and Gold</b>
  <c>full of pomp and gold</c>
  <d>
full of Pomp     and   Gold

</d>
</poem>

The stylesheet has a template rule for the a element with a series of xsl:if instructions. Each of these instructions compares the a element's content with something and reports whether the test is true.

<!-- xq327.xsl: converts xq326.xml into xq328.txt -->

<xsl:template match="a">

  <xsl:if test=". = 'full of Pomp and Gold'">
    1. a = "full of Pomp and Gold"
  </xsl:if>

  <xsl:if test=". = ../b">
    2. a = ../b
  </xsl:if>

  <xsl:if test=". = ../c">
    3. a = ../c
  </xsl:if>

  <xsl:if test=". != ../c">
    4. a != ../c
  </xsl:if>

  <xsl:if 
   test="translate(.,'abcdefghijklmnopqrstuvwxyz',
                     'ABCDEFGHIJKLMNOPQRSTUVWXYZ') = 
         translate(../c,'abcdefghijklmnopqrstuvwxyz',
                     'ABCDEFGHIJKLMNOPQRSTUVWXYZ')">
    5. a = ../c (ignoring case)
  </xsl:if>

  <xsl:if test=". = ../d">
    6. a = ../d
  </xsl:if>

  <xsl:if test=". = normalize-space(../d)">
    7. a = normalize-space(../d)
  </xsl:if>

</xsl:template>

As the result shows, xsl:if elements 1, 2, 4, 5, and 7 are true for the document above:

    1. a = "full of Pomp and Gold"
  
    2. a = ../b
  
    4. a != ../c
  
    5. a = ../c (ignoring case)
  
    7. a = normalize-space(../d)

Test number 1 in this stylesheet compares the a element (represented by ".") with the literal string "full of Pomp and Gold". They're equal, as the message added to the result tree tells us. Test 2 compares the a element with its sibling b element, and as the result shows, they too are equal. (If you're unfamiliar with the ../b notation to point to the b sibling, see the "Transforming XML" column Finding Relatives.)

Test 3 compares element a with element c, and they're not equal—two characters are in a different case. XML is very case-sensitive, so this xsl:if instruction adds nothing to the result.

Test 4 compares element a and c again, but using the != comparison operator to check for inequality. This test is true, so a message about Test 4 gets added to the result.

The fifth test uses the translate() function that we looked at last month to map the a and c elements to upper-case versions and compares those. Because upper-case versions of these two elements are the same, Test 5 is true, and the appropriate message gets added to the result.

XSLT offers no built-in way to automatically convert a string's case because the mapping is often dependent on the language being used—and sometimes, even on where it's being used. For example, an upper-case "é" at the start of a word is "É" in France but "E" in Canada.

Test 6 compares element a with element d, which has the same text and some additional white space—a few carriage returns and either spacebar spaces or tabs to indent the text. As the result document shows, the two elements are not equal.

Test 7 compares a and d again, but it compares a to a version of the d element returned by the normalize-space() function. This time, the equality test is true.

The normalize-space() function has been the savior of many string equality tests. XML's treatment of white space can be a complex topic, because it's not always clear which white space it ignores and which it recognizes. Any automated process that creates XML elements may put white space between elements or it may not, so a way to say "get rid of extraneous white space before comparing this string to something" is very useful in XSLT. In fact, the seventh xsl:if instruction above would be even better if both sides of the comparison in the xsl:if element's test attribute were passed to this function, like this:

<!-- xq329.xsl -->

  <xsl:if test="normalize-space(.) = normalize-space(../d)">
    7. a = normalize-space(../d)
  </xsl:if>

Search and Replace

The translate() function can replace specific characters with other characters, but XSLT offers no built-in method for globally replacing one string of text with another.

Global replacement is a basic text transformation task and XSLT is a language for transforming text (that is, a language for transforming XML documents, which are text) so string replacement is closely related to the tasks that a stylesheet developer often attacks with XSLT. Fortunately, existing XSLT techniques can be combined to give a stylesheet a search-and-replace capability. The most important technique is the use of parameters with recursive named templates; see the "Transforming XML" column Getting Loopy if you're unfamiliar with it.

As an example, we'll look at a stylesheet that converts the string "finish" to "FINISH" throughout the following XML document.

<winelist>

  <wine grape="Chardonnay">
    <winery>Benziger</winery>
    <product>Carneros</product>
    <year>1997</year>
    <desc>Well-textured flavors, good finish.</desc>
    <prices>
      <list>10.99</list>
      <discounted>9.50</discounted>
      <case>114.00</case>
    </prices>
  </wine>

  <wine grape="Cabernet">
    <winery>Duckpond</winery>
    <product>Merit Selection</product>
    <year>1996</year>
    <desc>Sturdy and generous flavors, long finish.</desc>
    <prices>
      <list>13.99</list>
      <discounted>11.99</discounted>
      <case>143.50</case>
    </prices>
  </wine>

</winelist>

The stylesheet has three template rules. The third one just copies all the source tree nodes except for text nodes to the result tree.

The second template rule handles text nodes. It calls the first template, the named "globalReplace" template, to add the text node template's contents to the result tree.

<!-- xq332.xsl: converts xq331.xml into xq333.xml -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes"/>

<xsl:template name="globalReplace">
  <xsl:param name="outputString"/>
  <xsl:param name="target"/>
  <xsl:param name="replacement"/>
  <xsl:choose>
    <xsl:when test="contains($outputString,$target)">
   
      <xsl:value-of select=
        "concat(substring-before($outputString,$target),
               $replacement)"/>
      <xsl:call-template name="globalReplace">
        <xsl:with-param name="outputString" 
             select="substring-after($outputString,$target)"/>
        <xsl:with-param name="target" select="$target"/>
        <xsl:with-param name="replacement" 
             select="$replacement"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$outputString"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

<xsl:template match="text()">
  <xsl:call-template name="globalReplace">
  <xsl:with-param name="outputString" select="."/>
  <xsl:with-param name="target" select="'finish'"/>
  <xsl:with-param name="replacement" select="'FINISH'"/>
  </xsl:call-template>
</xsl:template>

<xsl:template match="@*|*">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

The "globalReplace" named template is a general-purpose string replacement template based on one posted to the XSL-List mailing list by Mike J. Brown. As the example shows, it gets called with three parameters:

  • outputString is the string on which it will perform the global replacement.

  • target is the string that it will look for in outputString—the string that will be replaced.

  • replacement is the new string that will be substituted for any occurrence of target in outputString.

The template must add outputString to the result tree unchanged if it has no occurrence of the target string, so first it checks whether the target string is there or not. An if-else construction would be great for this, but XSLT offers no equivalent of an "else" condition to go with its xsl:if instruction. However, an xsl:choose instruction can perform the same logic with a single xsl:when element followed by an xsl:otherwise element. In the template, the xsl:when condition uses the contains() function to check whether outputString has target in it. If it does, an xsl:value-of instruction uses a concat() function to put together two strings for the result tree: everything in outputString before the first target and then the replacement string.

What about the rest of outputString, after the target that got found and replaced by the replacement string? The "globalReplace" named template makes a recursive call to itself to make any more substitutions necessary in the remaining part of the string, passing substring-after($outputString,$target) (that is, everything in outputString after the found occurrence of target) as the value of outputString for this new invocation of the function. If that new invocation finds another occurrence of the target string, it will add everything up to it and the replacement string to the result tree and then call the function again for the remainder of that string if necessary. By making recursive calls to handle the remainder of the string, it really is a global replace, because multiple occurrences of the target all get replaced.

If the xsl:when instruction's test attribute doesn't find the target string in outputString, the xsl:otherwise element's xsl:value-of instruction just adds the value of outputString to the result tree. This is the crucial stopping condition that any recursive template needs to ensure that it doesn't call itself forever. Whether outputString has zero occurrences of target or fifty of them, eventually this xsl:otherwise part of the xsl:choose instruction will get chosen and the "globalReplace" named template will not call itself again for this source tree text node.

The result of calling this stylesheet with the document above has both occurrences of the string "finish" replaced with "FINISH":

<winelist>

  <wine grape="Chardonnay">
    <winery>Benziger</winery>
    <product>Carneros</product>
    <year>1997</year>
    <desc>Well-textured flavors, good FINISH.</desc>
    <prices>
      <list>10.99</list>
      <discounted>9.50</discounted>
      <case>114.00</case>
    </prices>
  </wine>

  <wine grape="Cabernet">
    <winery>Duckpond</winery>
    <product>Merit Selection</product>
    <year>1996</year>
    <desc>Sturdy and generous flavors, long FINISH.</desc>
    <prices>
      <list>13.99</list>
      <discounted>11.99</discounted>
      <case>143.50</case>
    </prices>
  </wine>

</winelist>

One nice thing about this "globalReplace" named template is that it's a general purpose named template—it still works when called in other situations. For example, the following template also calls it, but note the template's match condition: it only replaces the one-character string "9" with the "0" in text nodes that are child nodes of year elements, because those are the nodes specified by the template rule's match condition.

<!-- xq334.xsl: converts xq331.xml into xq335.xml -->
<xsl:template match="year/text()">
  <xsl:call-template name="globalReplace">
  <xsl:with-param name="outputString" select="."/>
  <xsl:with-param name="target" select="'9'"/>
  <xsl:with-param name="replacement" select="'0'"/>
  </xsl:call-template>
</xsl:template>

When run with the same source document as the previous example, this template replaces the nines in the year elements and leaves the nines in the prices elements alone:

<?xml version="1.0" encoding="UTF-8"?>
<winelist>

  <wine grape="Chardonnay">
    <winery>Benziger</winery>
    <product>Carneros</product>
    <year>1007</year>
    <desc>Well-textured flavors, good finish.</desc>
    <prices>
      <list>10.99</list>
      <discounted>9.50</discounted>
      <case>114.00</case>
    </prices>
  </wine>

  <wine grape="Cabernet">
    <winery>Duckpond</winery>
    <product>Merit Selection</product>
    <year>1006</year>
    <desc>Sturdy and generous flavors, long finish.</desc>
    <prices>
      <list>13.99</list>
      <discounted>11.99</discounted>
      <case>143.50</case>
    </prices>
  </wine>

</winelist>

(If you really want to replace one character with another like this, the translate() function would be more efficient.) This demonstrates how customizing the stylesheet's use of the "globalReplace" template doesn't have to mean tinkering with the template itself. Instead, being more selective about the outputString value passed to the template allows the stylesheet to focus the template's power. The named template can be used in multiple situations exactly as it is.

These two columns have provided a tour of XSLT 1.0's string manipulation functions. XSLT 2.0 promises us some more, partly inspired by the string manipulation extension functions available in some XSLT processors. Check out your XSLT engine's documentation to see what else you may not have available to you; also see my book XSLT Quickly fordescriptions of more functions that can add power to your XSLT stylesheets.



1 to 1 of 1
  1. Some ready-made importable stylesheets
    2002-06-07 05:37:53 Reinout van Rees
1 to 1 of 1