Comparing XSLT and XQuery

March 9, 2005

XSLT has been the main XML technology for transformations for some time now, but it’s not the only player in the game. Although XQuery is designed for retrieving and interpreting information, it is also, according to the specification, “flexible enough to query a broad spectrum of XML information sources, including both databases and documents.”

In this article, we’ll be transforming the following XML source information from Cathy Kost, a beginning XML student who helps with a pot-bellied pig rescue organization.

<animal>

<species>pot belly pig</species>

    <name>Molly II</name>

    <birth>February, 1998</birth>

    <in-date>January, 2000</in-date>

    <from>Middle Ave.</from>

    <gender spay-neuter="yes">F</gender>

    <info>

    She is a sweet, friendly pig who likes to hang

    out on Cathy&#8217;s porch on the lounge pad.

    </info>

    <picture>

        <file>images/molly_th.jpg</file>

        <description>Black pig</description>

        <caption>Molly in the Pasture</caption>

    </picture>

</animal>

We will develop a transformation in both XSLT and XQuery. The transformations will change the XML into several HTML pages with four pigs per page, and an index page with links to the pig description pages. Both transformations will use built-in extensions to create multiple output files.

Each pig’s <picture> element will become an <img> element in the resulting file. It’s good practice to put a width and height attribute into image elements, but it’s a lot of work to have to look up each image’s dimensions. This is a perfect place for a user-defined extension function that, given an image’s file name, returns the image’s width and height.

Which Tools to Use?

For the XSLT transformation, we use the Apache Xalan XSLT processor. For XQuery, we use Qizx/open, which implements all features of the language except Schema import and validation.

The Main Differences

XSLT has a “processing engine” that automatically goes through the document tree and applies templates as it finds nodes; with XQuery the programmer has to direct the process. It’s almost like the difference between RPG (the business programming language, not role playing games) and procedure-oriented programming languages like C. In RPG, there is an implicit processing cycle, and you just set up the actions that you want to occur when certain conditions are met; in C, you are responsible for directing the algorithm.

XSLT is to XQuery as JavaScript is to Java. XSLT is untyped; conversions between nodes and strings and numbers are handled pretty much transparently. XQuery is a typed language which uses the types defined by XML Schema. XQuery will complain when it gets input that isn't of the appropriate type.

Global Setup

We want the number of pigs per page to be a global, user-settable parameter with a default value of four. In XSLT, we define this outside of any templates:

<xsl:param name="perPage" select="'4'"/>

In XQuery the following declaration appears as the first line in our query file:

declare variable $perPage as xs:integer := 4;

Both of these can be overridden by options on the command line. However, here is our first difference between XSLT and XQuery: any XSLT template may contain an <xsl:param>; that is how information gets passed among templates. XQuery’s declare variable defines global variables only, and cannot appear within a user-defined function.

We also want the output file to be XHTML transitional. In XSLT we accomplish this with the following element:

<xsl:output

 method="xml"

 indent="yes"

 omit-xml-declaration="yes"

 doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"

 doctype-system="http://www.w3.org/TR/xhtml1/DTD/

 xhtml1-transitional.dtd" />

In XQuery, we add these options to the UNIX shell script that will run Qizx/open:

-Xindent='yes' \

-Xmethod='XHTML' \

-X'doctype-public'='-//W3C//DTD XHTML 1.0 Transitional//EN'\

-X'doctype-system'=\

'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' \

Creating the Index Page

Here is what the index page looks like when we have four pigs per page:

Figure 1: Logo with list items

In XSLT, we provide a template to match the root <pig-rescue> element. (To save space, we’re not showing the code that generates the logo image on the index page.) Since we have to process the <animal> elements in two different ways—once for the index page and once for the display pages—we need to use a mode. The template will be applied only to every fourth (perPage) entry; this ensures that we get the correct number of list items in the unordered list.

<xsl:template match="pig-rescue">

<html>

<head>

    <link rel="stylesheet" type="text/css" href="bdr.css" />

    <title>The Pigs</title>

</head>

<body>

<div align="center">

<h1>The Pigs At Belly Draggers Ranch</h1>

</div>



<ul>

    <xsl:apply-templates

        select="animal[position() mod $perPage = 1]"

        mode="indexList" />

</ul>



</body>

</html>

</xsl:template>

In XQuery, processing the document becomes our single XQuery statement; in this case, an XQuery FLWOR expression. This acronym stands for the clauses in the expression:

for, which allows you to step through a sequence of items or nodes.
let, which allows you to declare and initialize variables.
where (optional), which allows you to specify under which conditions an item or node should be chosen.
order (optional), which sorts the selected items.
return, which returns the specified values for each of the selected items.

A FLWOR expression must have at least one for or let; ours has just a let which assigns the root element from the input document to the $doc variable. The return returns an <html> element. The parentheses aren’t really necessary as only one item is being returned, but we want to use them for the sake of consistency.

let $doc := fn:input()/pig-rescue

return

(

<html>

  <head>

    <link rel="stylesheet" type="text/css" href="bdr.css" />

    <title>The Pigs</title>

  </head>

  <body>

    <div align="center">

      <h1>The Pigs At Belly Draggers Ranch</h1>

    </div>

    

    <ul>

    {

        local:make-name-list( $doc/animal )

    }

    </ul>

    

    </body>

    </html>

)

The fn:input() in the preceding code is a Qizx/open extension that takes the input file name from the command line.

The text starting with the <html> tag is called a Direct Element Constructor, and it must be well-formed. Within one of these constructors, you may embed XQuery expressions by enclosing them in braces. In this case, we switch back to XQuery to call the local:make-name-list function, passing it all the <animal> nodes within the document. If the function name looks like it has a namespace prefix, that’s because it does. XQuery predefines the namespace prefix local and reserves it for use in defining local functions.

Creating a List Item for the Index Page

Let us now turn our attention to the XSLT that provides the list items with four pig names per entry. The numbers in the circles refer to the notes that follow the listing.

<xsl:template match="animal" mode="indexList">

<xsl:variable name="start" 

                  select="(position()-1)*$perPage + 1"/> 



    <xsl:variable name="end"> 

        <xsl:choose>

          <xsl:when test="$start + $perPage &gt;

           count(/pig-rescue/animal)">

          <xsl:value-of select="count(/pig-rescue/animal)"/>

            </xsl:when>

            <xsl:otherwise>

              <xsl:value-of select="$start + $perPage - 1"/>

            </xsl:otherwise>

        </xsl:choose>

    </xsl:variable>



    <xsl:variable name="filename">animals<xsl:value-of

       select="position()"/>.html</xsl:variable> 



      <li>

      <xsl:for-each

      select="/pig-rescue/animal[position() &gt;= $start and

      position() &lt;= $end]">

            

            <xsl:variable name="url"><xsl:value-of 

            select="$filename"/>#a<xsl:value-of

            select="$start+position()-1"/></xsl:variable>

            

            <a href="{$url}">

                <xsl:value-of select="name"/>

            </a>

            <xsl:call-template name="seriesSeparator"> 

              <xsl:with-param name="start" select="$start"/>

                <xsl:with-param name="end" select="$end"/>

            </xsl:call-template>

        </xsl:for-each>

    </li>

    

    <xsl:call-template name="makeSubfile"> 

        <xsl:with-param name="start" select="$start"/>

        <xsl:with-param name="end" select="$end"/>

        <xsl:with-param name="filename" select="$filename"/>

    </xsl:call-template>

</xsl:template>

We can’t just use position() for this; though the calling template selected every fourth item, the template sees the resulting nodes in that list as being numbered one, two, three, etc.
The last animal to be processed is the starting animal plus the number per page or the total number of animals, whichever is less.
The pigs’ names have to link to the page where their full descriptions will be. For the first four pigs, this is animals1.html, for the next four it is animals2.html, etc.
Construct the destination URL for each pig.
Listing the names separated by commas in a grammatically correct manner is tricky business, so we hand that off to a named template.
Finally, as long as we have figured out which pigs to process, we pass that information to a template that will construct the file we named in step 4 above.

The XQuery equivalent for this is the local:make-name-list function. The logic is the same, so the notes will concentrate on the XQuery-specific aspects.

declare function local:

make-name-list( $animalList as element()* ) as item()+ 

{

  for $pig at $pos in 

    $animalList[position() mod $perPage = 1] 

    let

        $n := count($animalList), 

        $filename := fn:concat("animals", $pos, ".html"),

        $start := ($pos - 1) * $perPage + 1,

        $end := if ($start + $perPage > $n)

            then 

                $n

            else

                $start + $perPage - 1

    return

    (

        <li> 

        {

 for $animal at $pos in $animalList[position() >= $start and

                position() <= $end] 

            return (

                <a href="{$filename}#a{$start + $pos - 1}">

                    {$animal/name/text()} 

                </a>,

                local:series-separator( $start, $pos, $end )

            )

        }

        </li>,

local:make-subfile( $animalList, $start, $end, $filename) 

    )

};

In an XQuery function, you should always specify the types of function parameters and return values. In this case, we need to specify that the $animalList parameter will consist of element()*, which means zero or more elements. The function returns item()+, which means one or more items.

If you do not specify a type for the parameter or return value, XQuery assigns item()*, meaning zero or more items, where an item() is equivalent to XML Schema’s xs:anyType. This is normally not what you want.
Here is a for clause, stepping through every fourth animal in the list. The at $pos modifier has the same effect as <xsl:value-of select="position()">. In XQuery, you can use the position() function only inside a predicate of an XPath expression.
You may do several different assignments within a let clause by separating them with commas. Notice the assignment to $end, which uses an if expression. Since this is an expression and not a statement, you must always have both a then and else so that it always yields a value.
The return’s first value is the list item. The <li> puts us into direct constructor mode, so we need braces to re-enter XQuery mode to create the contents.
This line is the reason we needed to declare $animalList as element()*. You cannot use an anonymous item as a path step; you must have a node or an element.

Also, we can’t just say $animal/name. Unlike <xsl:value-of select="animal/name"/>, which yields a text value, $animal/name puts a copy of the <name> element, tags and all, into the return value. If we want just the text, we have to explicitly add the extra text() step to the XPath expression.
Making the page with the pigs’ description is a task that we hand off to another local function. Its output will be the second item in this function’s return value (note the comma on the preceding line), and that value will eventually make its way into the output, so the function will have to return the null string as its value.

Notice that the return expression switches between direct element constructor mode and XQuery expression evaluation mode several times. In XSLT, the difference between commands to the XSLT processor and elements destined for output is fairly easy to distinguish due to the leading xsl: prefix. When you first start writing XQuery, it can be difficult to see—but always important to remember—which mode you are in.

Putting the correct separator after a pig’s name boils down to one of four cases:

last pig in the series: no comma
next to last pig in a group of two: “ and ”
next to last pig in a group of three or more: “ , and ”
other pig in a series: a comma followed by a blank

In XSLT, this is a simple <xsl:choose>, and we won’t show it here. In XQuery, it is a simple nested if. The types in the following declaration are based on XML Schema’s predefined types, which means you also get all the quirks and non-extensibility of the XML Schema type list. The function doesn’t need a FLWOR expression; the result of the nested if is the function’s return value.

declare function local:

series-separator( $start as xs:integer, $pos as xs:integer, 

$end as xs:integer) as xs:string   

{

    if (($start + $pos < $end) and ($end - $start > 1))

    then

        ", "

   else if (($start + $pos = $end) and ($end - $start >= 2))

    then

        ", and "

    else if (($start + $pos = $end) and ($end - $start = 1))

    then

        " and "

    else

        ""

};

Intermission

Before proceeding to the extensions for XSLT and XQuery, let’s pause for a brief summary that will help you translate from XSLT to XQuery.

XSLT	XQuery
`<xsl:param name="x" select="10"/>` (global parameter)	`declare variable $x as xs:integer := 10;`
Parameters to `<xsl:output/>`	Command line parameters to Qizx/open
<xsl:template match="p"> <!-- template body --> </xsl:template> invoked by: <xsl:apply-templates select="XPath/to/p"/>	declare function local:process-p( $pList as element()* ) { for $p in $pList (: function body :) } with a call: local:process-p( XPath/to/p )
<xsl:call-template name="action"> <xsl:with-param name="p1" select="value"/> </xsl:call-template>	let $p1 := value return local:action($p1)
<xsl:variable name="x" select="value"/>	let $x := value
`position()` outside a predicate	`for $item at $pos in $sequence`
<xsl:if>	No equivalent; all `if` expressions must have an `else`.
<xsl:choose> <xsl:when test="cond 1"> <!-- value 1 --> <xsl:when> <xsl:when test="cond 2"> <!-- value 2 --> </xsl:when> <xsl:otherwise> <!-- value 3 --> </xsl:otherwise>	if (cond 1) (: value 1 :) else if (cond 2) (: value 2 :) else (: value 3 :)
“Counting loops” implemented by recursion	`for $i in 1 to n`

Built-in Extensions

We are now in a position to make the subfiles that display the information about each group of pigs. Using Xalan, we must add a namespace to the xmlns:redirect="org.apache.xalan.xslt.extensions.Redirect" to the root <xsl:stylesheet> element. The template that makes the subfile follows. To save space, we do not show the code that shows the next/previous page links at the bottom of each page.

<xsl:template name="makeSubfile">

    <xsl:param name="start"/>

    <xsl:param name="end"/>

    <xsl:param name="filename"/>



    <!-- calculate this once, for use in next/back links -->

    <xsl:variable name="currentPage"

        select="(($start - 1) div $perPage) + 1"/>



    <redirect:write select="$filename">  

        <html>

        <head>

    <link rel="stylesheet" type="text/css" href="bdr.css" />

<title>Animals Page <xsl:value-of select="$currentPage"/>

</title>

        </head>

        <body>

        <div align="center">

        <h1>Animals <xsl:value-of select="$start"/> - 

        <xsl:value-of

        select="$end"/></h1>

        </div>

        <table border="0">

                <xsl:apply-templates select="self::animal | 

                    following-sibling::animal[position()

                    &lt; $perPage]" mode="display"> 

              <xsl:with-param name="start" select="$start"/>

                </xsl:apply-templates>

        </table>

        </body>

        </html>

    </redirect:write>

</xsl:template>

Everything between this tag and the closing </redirect:write> will be output to the file named in $filename.
This ugly expression works out to the current animal and all the remaining ones on the page. It needs a mode because it is processing the <animal> elements again, and the mode tells XSLT which template to invoke.

Now, the equivalent XQuery. Our strategy is to create the output page in a variable, and then use Qizx/open’s x:serialize extension function to direct it to a file.

declare function local:

make-subfile( $animalList as element()*,

    $start as xs:integer,

    $end as xs:integer,

    $filename as xs:string) as xs:string

{

    let

        $currentPage := (($start - 1) div $perPage) + 1 



    let

        $htmlPage := 

            <html>

            <head>

    <link rel="stylesheet" type="text/css" href="bdr.css" />

                <title>Animals Page {$currentPage}</title>

            </head>

            <body>

            <div align="center">

            <h1>Animals {$start} - {$end}</h1>

            </div>

            

            <table border="0">

            {

               local:display-animals( 

   $animalList[position() >= $start and position() <= $end],

                 $start

                )

            }

            </table>

            </body>

            </html>

        

    let

        $outputFilename:=

            x:serialize( $htmlPage, 

                <options output="animals{$currentPage}.html"

                indent="yes"

                omit-xml-declaration="yes"

     doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"

                doctype-system=

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>)

    return

    (

        ""

    )

};

In this function, we are using multiple let clauses rather than a series of variables separated by commas; it makes the code clearer to read.
We can’t use the same XPath expression that we used in the XSLT to handle the individual animals. In XSLT, makeSubfile was called in the context of an <animal> element. XQuery does not automatically pass the context on to called functions, which is why we must pass the entire $animalList.
Qizx/open’s x:serialize function takes two arguments. The first is an XML tree you want serialized. The second is an element patterned along the lines of XSLT’s <xsl:output> element.

A Comment about Comments

In order to place a comment into XQuery, you enclose it in smiley faces (: and :), which works fine when you are in XQuery expression mode:

let $pi := 3.14159 (: just a quick approximation :)

Unfortunately, this doesn’t work well when you are in direct element constructor mode. The first of the three following examples will simply place text into the XML tree, smiley faces and all. Enclosing the comment in braces to enter XQuery expression mode gives a syntax error because an expression in braces must yield a value. The only way to get around this is to provide the null string as the value of the expression, as shown in the third example.

<a href="#">Main Page</a> (: activate link later :)

<a href="#">Main page</a> { (: activate link later :) }

<a href="#">Main page</a> { (: activate link later :) ""}

Writing a Xalan Extension

In order to retrieve the width and height of an image given its file name, we will write a Xalan extension function in Java. It will be in a class named XImageSize (X for Xalan). This function, named getDimensions, will take a file name as string input and return an empty XML element with attributes containing the file name and the image’s width and height. The return value “cleans up” the file name by removing leading and trailing whitespace. The general model for this element is:

<imageSize fileName="fileName"

    width="width" height="height" />

In order to use this extension, we need to add some information to the XSL stylesheet. We need to establish a namespace for the extension and register that prefix as one belonging to an extension. We also want to make sure that this prefix never makes it into the output document.

<xsl:stylesheet

    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

    version="1.0"

    xmlns:img="info.evccit.utils"

    extension-element-prefixes="img"

    exclude-result-prefixes="img">

Once this is set up, the XSL stylesheet can call the function and extract the width and height directly from the returned <imageSize> element as follows:

<xsl:template match="picture">

    <xsl:variable name="dimensions"

        select="img:ImageSize.getDimensions(string(file))"/>

    <img src="{$dimensions/@fileName}"

        width="{$dimensions/@width)}"

        height="{$dimensions/@height}"

        alt="{description}"

        title="{caption}" />

</xsl:template>

In order to return an element, the function must have access to a Document and its createElement() method. We know that it is possible for an extension function to do this; the tokenize() extension in org.apache.xalan.lib.Extensions does it. Because Xalan is Open Source, we can look at the code and copy it wholesale into ours. We also need to put in the appropriate attribution and include a copy of the Apache License information along with the source code.

/**

 * This class is not loaded until first referenced

 * (see Java Language Specification by Gosling/Joy/Steele,

 * section 12.4.1)

 *

 * The static members are created when this class is

 * first referenced, as a lazy initialization not needing

 * checking against null or any synchronization.

 *

 * This function Copyright 1999-2004

 * The Apache Software Foundation.

 */

private static class DocumentHolder 

{

    // Reuse the Document object to reduce memory usage.

    private static final Document m_doc;

    static {

        try

        {

            m_doc =

            DocumentBuilderFactory.newInstance().

                newDocumentBuilder().newDocument();

        }

       

        catch(ParserConfigurationException pce)

        {

              throw new org.apache.xml.utils.

                WrappedRuntimeException(pce);

        }



    }

}

This class will go into the main XImageSize class, which looks like this.

package info.evccit.utils;



public class XImageSize

{

    static char fileSep; 

    

    static {

        char[] carr =

         System.getProperty("file.separator").toCharArray();

        fileSep = carr[0];

    }



    public static Node getDimensions( String fileName )

    {

        Document doc = DocumentHolder.m_doc;

        Element result = null;                                    

        fileName = fileName.trim();

        try

        {

            Dimension d = 

      ImageFileDimensions.getFileDimensions( fileName ); 

            result = doc.createElement("imageSize");

            result.setAttribute( "fileName",

                fileName.replace( fileSep,  '/' ) ); 

            result.setAttribute( "width",

                Integer.toString((int) d.getWidth() ));

            result.setAttribute( "height",

                Integer.toString((int) d.getHeight() ));

        }

        catch (Exception e)

        {

            result = null;

        }

        return result;

    }

}

The static initialization of the class saves the system’s file separator character.
The call to ImageFileDimensions.getFileDimensions() opens up the file and reads the first few bytes to determine whether it is a gif, JPG, or GIF file. Depending upon the file type, it does the appropriate work to extract the width and height and returns it in a Dimension object. The exact details aren’t relevant to this article, so the code isn’t shown here.
We have to replace the file separator character with a slash, which is the standard separator for URLs.

The source XML file sets the base path for all the images with the <image-base> element. Rather than do a complicated normalize-space() and concat() to join the base path to the image file name in the XSLT, we create a second version of getDimensions() that accepts two strings and does the heavy lifting:

public static Node 

           getDimensions( String pathName, String fileName )

{

String fileSeparator = System.getProperty("file.separator");

    String combinedName;

    

    pathName = pathName.trim();

    fileName = fileName.trim();



    if (pathName.endsWith( fileSeparator ))

    {

        combinedName = pathName + fileName;

    }

    else

    {

        combinedName = pathName + fileSeparator + fileName;

    }



    return getDimensions( combinedName );

}

If you download the code, you will see that we have heavily overloaded the getDimensions() function by allowing it to accept a Node or NodeList for either or both parameters, but that isn’t the point of this article. Onward to...

Writing an XQuery Extension

The code for this extension is almost identical to the Xalan extension. Instead of returning an <imageSize> element, however, we will return a vector of three items: the filename, the width, and the height. Qizx/open will interpret this as a sequence of items.

The XQuery file must connect the class, which is named QImageSize, with a namespace. This statement goes at the head of the XQuery file. Note carefully! This assignment uses a single equal sign, not the := used for a let clause. We will also have to pass the class name to Qizx/open on the command line when we run the query; this lets Qizx/open know that this is an authorized extension and no security exception needs to be raised.

declare namespace imgsize =

                        "java:info.evccit.utils.QImageSize";

Once the namespace is established, XQuery can extract the information as part of the pig display code:

for $animal at $pos in $animalList

    let

        $basePath := $animal/../image-base,

        $dimensions := imgsize:getDimensions($basePath,

            $animal/picture/file/text() )

    return

    (

        <img

            src="{$dimensions[1]}" 

            width="{$dimensions[2]}"

            height="{$dimensions[3]}"

            alt="{$animal/picture/description/text()}"

            title="{$animal/picture/caption/node()}"

            hspace="4" />

    )

Here’s the code for the function that takes the entire filename as one string parameter:

package info.evccit.utils;



import net.xfra.qizxopen.xquery.dm.Node;



import java.awt.Dimension;

import java.util.Vector;



public class QImageSize

{

    static char fileSeparator;

    

    static {

        char[] carr =

         System.getProperty("file.separator").toCharArray();

        fileSeparator = carr[0];

    }

    

    public static Vector getDimensions( String fileName )

    {

        Vector result = new Vector(3);

        fileName = fileName.trim();

        try

        {

          Dimension d = 

          ImageFileDimensions.getFileDimensions( fileName );

            result.add( fileName.replace( fileSeparator, '/' ) );

            result.add( new Integer( (int) d.getWidth() ) );

            result.add( new Integer( (int) d.getHeight() ) );

        }

        catch (Exception e)

        {

            result = null;

        }

        return result;

    }

}

The two-string version of getDimensions() is exactly the same as the Xalan version, except that it returns a Vector instead of a Node. This function has also been heavily overloaded to accept a Qizx/open Node (which is not the same as a DOM node) so that the caller doesn’t have to dig down to the text() step in the path.

Getting the Code

You can download the sample pig rescue file, XSLT stylesheet, XQuery file, extension sources, and Apache License here. The Java source files are in the info directory, and the API documentation is in the doc directory. Make sure you put the ImageSize.jar file in your classpath when invoking Xalan and/or Qizx/open.

The shell files xcompile.sh and qcompile.sh will compile the Xalan and Qizx/open extensions. Files make_javadoc.sh and make_jar.sh create the Javadoc and ImageSize.jar files. Files run_xalan.sh and run_qizx.sh run the transformation and XQuery.

Thanks to Xavier Franc, author of Qizx/open, for his advice and information on using XQuery.