Menu

Writing Your Own Functions in XSLT 2.0

September 3, 2003

Bob DuCharme

Most XSLT 1.0 processors, particularly the ones written in Java, let you write extension functions in the processor's host language, link them in, and then call those functions from stylesheets. The XSLT 1.0 spec spells out specific ways to check whether a particular extension function is available and how to recover gracefully if not. In the September 2001 "Transforming XML" column, I presented examples of extension elements and functions.

If you wanted to write your own functions within a stylesheet, there were ways to fake it with named templates, but faking it won't be necessary with XSLT 2.0, which lets you write your own functions using XSLT syntax. These functions return values that can be used all over your spreadsheet, even in XPath expressions.

Let's look at a simple example. The following stylesheet creates a result tree upon seeing the root of any document, so you can run it with itself as input. It declares a function called foo:compareCI, which does a case-insensitive comparison of two strings and returns the same values as the XSLT 2.0 compare() function described in last month's column.

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:foo="http://whatever">

  <!-- Compare two strings ignoring case, returning same
       values as compare(). -->
  <xsl:function name="foo:compareCI">
    <xsl:param name="string1"/>
    <xsl:param name="string2"/>
    <xsl:value-of select="compare(upper-case($string1),upper-case($string2))"/>
  </xsl:function>

  <xsl:template match="/">
compareCI red,blue: <xsl:value-of select="foo:compareCI('red','blue')"/>
compareCI red,red: <xsl:value-of select="foo:compareCI('red','red')"/>
compareCI red,Red: <xsl:value-of select="foo:compareCI('red','Red')"/>
compareCI red,Yellow: <xsl:value-of select="foo:compareCI('red','Yellow')"/>
  </xsl:template>

</xsl:stylesheet>
    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

The first thing to notice is that the declared function must come from a namespace outside of the XSLT namespace. In the example I assigned a namespace prefix of foo to the http://whatever URL to make it clear that you can use any namespace, as long as it's not the XSLT namespace. The URL I specified wasn't serious, but works anyway. You'll probably want to pick a URL associated with your company or project.

The actual function declaration in the sample stylesheet is in an xsl:function element. Its structure is pretty straightforward: a name attribute stores the function's name, and optional xsl:param child elements name parameters that can be passed to the function, just like xsl:param elements do in XSLT 1.0's xsl:template elements. In the example above, the two parameters passed are the two strings to be compared.

The function's only remaining line is an xsl:value-of instruction, which uses XPath 2.0's compare() and upper-case() functions to perform its comparison and output the result. The return value of the function is the sequence of nodes that it outputs. If you want, you can add an as attribute to the xsl:function element to indicate a specific data type that the function returns. Because my foo:compareCI() function returns the integer returned by its call to the compare() function, I could have added an as="xs:integer" attribute value to the xsl:function element (which would have required declaration of the http://www.w3.org/2001/XMLSchema namespace to go with that "ns" prefix), but I wanted to keep my first example function as simple as possible.

When run with Saxon 7's experimental XSLT 2.0 support, this stylesheet creates the following output:

<?xml version="1.0" encoding="UTF-8"?>
compareCI red,blue: 1
compareCI red,red: 0
compareCI red,Red: 0
compareCI red,Yellow: -1

The third line is the most important here because it shows that the function considers "red" and "Red" to be equal. (See last month's column for the meaning of the various return values.)

XSLT 2.0 functions can be recursive. The following stylesheet includes a substring function that expects you to pass it a string (inString) and the length of a substring to pull from that string (length), starting at its first character. Instead of always breaking after length characters, though, this function only breaks there if it finds a word boundary character. Otherwise, it breaks at the last word boundary before that. It does this by calling itself with the same inString value and a length value of length - 1. Before making each recursive call, the function's xsl:choose element's first xsl:when element checks whether $length is less than or equal to 0 and returns the entire string if so, because if $length was decremented that far, there's no point in continuing. The second xsl:when element checks whether the passed string is already shorter than the requested length, in which case it just returns the whole string. The third and last xsl:when element checks whether character number $length in $inString is a member of the list of delimiter characters defined near the beginning of the stylesheet, and if so, returns the string up to that point, because its job is done. If none of these conditions are true, the xsl:otherwise element makes the recursive call.

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:foo="http://whatever">

  <xsl:output method="text"/>

  <xsl:variable name="delimiters"> ,."!?()</xsl:variable>

  <xsl:function name="foo:substrWordBoundary">
    <xsl:param name="inString"/>
    <xsl:param name="length"/>
    <xsl:choose>
      <xsl:when test="$length <= 0">
        <xsl:value-of select="$inString"/>
      </xsl:when>
      <xsl:when test="string-length($inString) <= $length">
        <xsl:value-of select="$inString"/>
      </xsl:when>
      <xsl:when test="contains($delimiters,substring($inString,$length + 1,1))">
        <xsl:value-of select="substring($inString,1,$length)"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="foo:substrWordBoundary($inString,$length - 1)"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

  <xsl:template match="/">
20 chars: <xsl:value-of select="foo:substrWordBoundary('This is a test.Right? Yes.',20)"/>
10 chars: <xsl:value-of select="foo:substrWordBoundary('This is a test.Right? Yes.',19)"/>
already short enough: <xsl:value-of select="foo:substrWordBoundary('catatonic',15)"/>
no boundaries: <xsl:value-of select="foo:substrWordBoundary('catatonic',5)"/>
  </xsl:template>

</xsl:stylesheet>

The four strings passed to the function test several possible outcomes. With any source document, the stylesheet creates this result:

20 chars: This is a test.Right
10 chars: This is a test
already short enough: catatonic
no boundaries: catatonic

What happens if we pass a bad parameter to the function? For example, what if we added this new line after the "no boundaries" line, passing the string "five" instead of a numeric digit as the second parameter?

bad parameter: <xsl:value-of select="foo:substrWordBoundary('catatonic','five')"/>

Without executing the function on any of the legitimate input, Saxon 7 immediately tells us about the following problem:

Error at xsl:choose on line 13 of file:/C:/dat/writing/trxml/temp/sswb1.xsl:
  Cannot compare xs:string to xs:integer
Transformation failed: Run-time errors were reported

The stronger typing offered by XSLT 2.0 lets us plan for this a little better. By adding an as attribute to the function's declaration for the length parameter, like this,

<xsl:param name="length" as="xs:integer"/>

we tell the XSLT processor to check the types of the parameters when they're passed, instead of waiting for the bad data to blow up in some line of the stylesheet that doesn't know what to do with it. (Don't forget to add xmlns:xs="http://www.w3.org/2001/XMLSchema" to the other namespace declarations in the stylesheet's start-tag.) With length declared using this typing, Saxon 7 catches the error sooner and delivers a more informative error message:

Error at xsl:value-of on line 35 of sswb2.xsl:
  Required type of second argument of *** call to user function ***()
is xs:integer; supplied value has type xs:string
Transformation failed: Failed to compile stylesheet. 1 error detected.

Nearly all serious programming languages offer the ability to declare and use your own functions; most programmers have become accustomed to the modularity and scalability advantages that this gives them. Now XSLT 2 developers will have these advantages as well.

Although you can declare and use your own functions in popular programming languages ranging from C to JavaScript, they don't quite count as functional languages. (Warning: readers with no trace of LISP/Scheme geek in them may want to stop reading now.) If DSSSL is XSLT's parent, that makes Scheme its grandparent and LISP its great-grandparent. Between XSLT's xsl:function element and its idea of node sequences, I realized that I could implement the classic car and cdr functions that return either the first item or the remainder of a list, respectively. LISP does stand for "LISt Processing," after all, and not "Lots of Irritating Silly Parentheses". These two functions don't do much by themselves, but as two of the basic building blocks of LISP and later Scheme, they've provided the foundation for useful applications for over 40 years. (The origin of the names "car" and "cdr," pronounced "could-er," is one of the classic old twisted history of computer science stories.)

After the following stylesheet declares these two functions, it outputs the sample input list delimited by pipe characters. It then tests the functions individually and combines them into a more complex expression to extract the third member of the list sequence:

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:foo="http://whatever">

  <xsl:output method="text"/>

  <xsl:variable name="seq1" select="('a','b','c','d')"/>

  <xsl:function name="foo:cdr">
    <xsl:param name="seq"/>
    <xsl:for-each select="subsequence($seq,2)">
      <xsl:value-of select="."/>
    </xsl:for-each>
  </xsl:function>

  <xsl:function name="foo:car">
    <xsl:param name="seq"/>
    <xsl:value-of select="item-at($seq,1)"/>
  </xsl:function>

  <xsl:template match="/">
seq1: <xsl:value-of select="string-join($seq1,'|')"/>
car(seq1): <xsl:value-of select="string-join(foo:car($seq1),'|')"/>
cdr(seq1): <xsl:value-of select="string-join(foo:cdr($seq1),'|')"/>
car(cdr(cdr(seq1))): <xsl:value-of select=
               "string-join(foo:car(foo:cdr(foo:cdr($seq1))),'|')"/>

  </xsl:template>

</xsl:stylesheet>

The output shows that it works. It may not look particularly useful, but it should provoke a smirk from some of the grayer-haired developers out there:

seq1: a|b|c|d
car(seq1): a
cdr(seq1): b|c|d
car(cdr(cdr(seq1))): c