Comparing XSLT and XQuery
XSLT has been the main XML technology for transformations for some time now, but it’s not the only player in the game. Although XQuery is designed for retrieving and interpreting information, it is also, according to the specification, “flexible enough to query a broad spectrum of XML information sources, including both databases and documents.”
In this article, we’ll be transforming the following XML source information from Cathy Kost, a beginning XML student who helps with a pot-bellied pig rescue organization.
<animal>
<species>pot belly pig</species>
<name>Molly II</name>
<birth>February, 1998</birth>
<in-date>January, 2000</in-date>
<from>Middle Ave.</from>
<gender spay-neuter="yes">F</gender>
<info>
She is a sweet, friendly pig who likes to hang
out on Cathy’s porch on the lounge pad.
</info>
<picture>
<file>images/molly_th.jpg</file>
<description>Black pig</description>
<caption>Molly in the Pasture</caption>
</picture>
</animal>
We will develop a transformation in both XSLT and XQuery. The transformations will change the XML into several HTML pages with four pigs per page, and an index page with links to the pig description pages. Both transformations will use built-in extensions to create multiple output files.
Each pig’s <picture> element will become an
<img> element in the resulting file.
It’s good practice to put a width and
height attribute into image elements, but it’s
a lot of work to have to look up each image’s dimensions.
This is a perfect place for a user-defined extension function that, given an
image’s file name, returns the image’s width and height.
For the XSLT transformation, we use the Apache Xalan XSLT processor. For XQuery, we use Qizx/open, which implements all features of the language except Schema import and validation.
XSLT has a “processing engine” that automatically goes through the document tree and applies templates as it finds nodes; with XQuery the programmer has to direct the process. It’s almost like the difference between RPG (the business programming language, not role playing games) and procedure-oriented programming languages like C. In RPG, there is an implicit processing cycle, and you just set up the actions that you want to occur when certain conditions are met; in C, you are responsible for directing the algorithm.
XSLT is to XQuery as JavaScript is to Java. XSLT is untyped; conversions between nodes and strings and numbers are handled pretty much transparently. XQuery is a typed language which uses the types defined by XML Schema. XQuery will complain when it gets input that isn't of the appropriate type.
We want the number of pigs per page to be a global, user-settable parameter with a default value of four. In XSLT, we define this outside of any templates:
<xsl:param name="perPage" select="'4'"/>
In XQuery the following declaration appears as the first line in our query file:
declare variable $perPage as xs:integer := 4;
Both of these can be overridden by options on the command line. However, here is
our first difference between XSLT and XQuery: any XSLT template may contain an
<xsl:param>; that is how information gets passed among
templates. XQuery’s declare variable defines global
variables only, and cannot appear within a user-defined function.
We also want the output file to be XHTML transitional. In XSLT we accomplish this with the following element:
<xsl:output
method="xml"
indent="yes"
omit-xml-declaration="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/
xhtml1-transitional.dtd" />
In XQuery, we add these options to the UNIX shell script that will run Qizx/open:
-Xindent='yes' \
-Xmethod='XHTML' \
-X'doctype-public'='-//W3C//DTD XHTML 1.0 Transitional//EN'\
-X'doctype-system'=\
'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' \
Here is what the index page looks like when we have four pigs per page:
Figure 1: Logo with list items
In XSLT, we provide a template to match the root <pig-rescue>
element. (To save space, we’re not showing the code that generates
the logo image on the index page.) Since we have to process the
<animal> elements in two different ways—once for
the index page and once for the display pages—we need to use a
mode. The template will be applied only to every
fourth (perPage) entry; this ensures that we get the
correct number of list items in the unordered list.
<xsl:template match="pig-rescue">
<html>
<head>
<link rel="stylesheet" type="text/css" href="bdr.css" />
<title>The Pigs</title>
</head>
<body>
<div align="center">
<h1>The Pigs At Belly Draggers Ranch</h1>
</div>
<ul>
<xsl:apply-templates
select="animal[position() mod $perPage = 1]"
mode="indexList" />
</ul>
</body>
</html>
</xsl:template>
In XQuery, processing the document becomes our single XQuery statement; in this case, an XQuery FLWOR expression. This acronym stands for the clauses in the expression:
for, which allows you to step through a sequence of
items or nodes.let, which allows you to declare and initialize variables.where (optional), which allows you to specify under which conditions
an item or node should be chosen.order (optional), which sorts the selected items.return, which returns the specified values for each of the
selected items.A FLWOR expression must have at least one for or
let; ours has just a let which assigns
the root element from the input document to the $doc
variable. The return returns an
<html> element. The parentheses aren’t really
necessary as only one item is being returned, but we want to use them for
the sake of consistency.
let $doc := fn:input()/pig-rescue
return
(
<html>
<head>
<link rel="stylesheet" type="text/css" href="bdr.css" />
<title>The Pigs</title>
</head>
<body>
<div align="center">
<h1>The Pigs At Belly Draggers Ranch</h1>
</div>
<ul>
{
local:make-name-list( $doc/animal )
}
</ul>
</body>
</html>
)
The fn:input() in the preceding code
is a Qizx/open extension that takes the input
file name from the command line.
The text starting with the <html> tag
is called a Direct Element Constructor, and it must
be well-formed. Within one of these constructors,
you may embed XQuery expressions by enclosing them in braces. In this
case, we switch back to XQuery to call the
local:make-name-list function, passing it all the
<animal> nodes within the document.
If the function name
looks like it has a namespace prefix, that’s because it does.
XQuery predefines the namespace prefix local and reserves
it for use in defining local functions.
|
Creating a List Item for the Index Page
Let us now turn our attention to the XSLT that provides the list items with four pig names per entry. The numbers in the circles refer to the notes that follow the listing.
<xsl:template match="animal" mode="indexList">
<xsl:variable name="start"
select="(position()-1)*$perPage + 1"/>
<xsl:variable name="end">
<xsl:choose>
<xsl:when test="$start + $perPage >
count(/pig-rescue/animal)">
<xsl:value-of select="count(/pig-rescue/animal)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$start + $perPage - 1"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:variable name="filename">animals<xsl:value-of
select="position()"/>.html</xsl:variable>
<li>
<xsl:for-each
select="/pig-rescue/animal[position() >= $start and
position() <= $end]">
<xsl:variable name="url"><xsl:value-of
select="$filename"/>#a<xsl:value-of
select="$start+position()-1"/></xsl:variable>
<a href="{$url}">
<xsl:value-of select="name"/>
</a>
<xsl:call-template name="seriesSeparator">
<xsl:with-param name="start" select="$start"/>
<xsl:with-param name="end" select="$end"/>
</xsl:call-template>
</xsl:for-each>
</li>
<xsl:call-template name="makeSubfile">
<xsl:with-param name="start" select="$start"/>
<xsl:with-param name="end" select="$end"/>
<xsl:with-param name="filename" select="$filename"/>
</xsl:call-template>
</xsl:template>
position() for this;
though the calling template selected every fourth item, the template sees
the resulting nodes in that
list as being numbered one, two, three, etc.animals1.html, for the next four it is animals2.html,
etc.
The XQuery equivalent for this is the
local:make-name-list function.
The logic is the same, so the notes will concentrate on the XQuery-specific
aspects.
declare function local:
make-name-list( $animalList as element()* ) as item()+
{
for $pig at $pos in
$animalList[position() mod $perPage = 1]
let
$n := count($animalList),
$filename := fn:concat("animals", $pos, ".html"),
$start := ($pos - 1) * $perPage + 1,
$end := if ($start + $perPage > $n)
then
$n
else
$start + $perPage - 1
return
(
<li>
{
for $animal at $pos in $animalList[position() >= $start and
position() <= $end]
return (
<a href="{$filename}#a{$start + $pos - 1}">
{$animal/name/text()}
</a>,
local:series-separator( $start, $pos, $end )
)
}
</li>,
local:make-subfile( $animalList, $start, $end, $filename)
)
};
In an XQuery function, you should always specify the types of function
parameters and return values. In this case, we need to specify that the
$animalList parameter will consist of element()*,
which means zero or more elements. The function returns item()+,
which means one or more items.
If you do not specify a type for the parameter
or return value, XQuery assigns item()*,
meaning zero or more items, where an
item() is
equivalent to XML Schema’s xs:anyType.
This is normally not what you want.
Here is a for clause, stepping through every fourth
animal in the list. The at $pos modifier has the same effect
as <xsl:value-of select="position()">. In XQuery, you
can use the position() function only inside a predicate of
an XPath expression.
You may do several different assignments within a let clause by
separating them with commas. Notice the assignment to $end,
which uses an if expression. Since this is an expression and
not a statement, you must always have both a then and
else so that it always yields a value.
The return’s first value is the list item.
The <li> puts us into direct constructor mode,
so we need braces to re-enter XQuery mode to create the contents.
This line is the reason we needed to declare $animalList
as element()*. You cannot use an anonymous item as a path step;
you must have a node or an element.
Also, we can’t just say
$animal/name. Unlike
<xsl:value-of select="animal/name"/>,
which yields a text
value, $animal/name puts a copy
of the <name>
element, tags and all, into the return value.
If we want just the text, we have to explicitly
add the extra text() step to the XPath expression.
Making the page with the pigs’ description is a task that we hand off to another local function. Its output will be the second item in this function’s return value (note the comma on the preceding line), and that value will eventually make its way into the output, so the function will have to return the null string as its value.
Notice that the return expression switches between
direct element constructor mode and XQuery expression evaluation mode
several times.
In XSLT, the difference between commands to the XSLT processor and
elements destined for output is fairly easy to distinguish
due to the leading xsl: prefix. When you first
start writing XQuery, it can be
difficult to see—but always important to remember—which mode
you are in.
Putting the correct separator after a pig’s name boils down to one of four cases:
In XSLT, this is a simple <xsl:choose>, and
we won’t show it here. In XQuery,
it is a simple nested if.
The types in the following declaration are based on XML Schema’s
predefined types, which means you also get all the quirks and
non-extensibility of the XML Schema type list. The function doesn’t
need a FLWOR expression; the result of the nested if is the
function’s return value.
declare function local:
series-separator( $start as xs:integer, $pos as xs:integer,
$end as xs:integer) as xs:string
{
if (($start + $pos < $end) and ($end - $start > 1))
then
", "
else if (($start + $pos = $end) and ($end - $start >= 2))
then
", and "
else if (($start + $pos = $end) and ($end - $start = 1))
then
" and "
else
""
};
|
Intermission
Before proceeding to the extensions for XSLT and XQuery, let’s pause for a brief summary that will help you translate from XSLT to XQuery.
| XSLT | XQuery |
|---|---|
<xsl:param name="x" select="10"/>(global parameter) |
declare variable $x as xs:integer := 10; |
Parameters to <xsl:output/> |
Command line parameters to Qizx/open |
invoked by:
|
with a call:
|
|
|
|
|
position() outside a predicate |
for $item at $pos in $sequence |
|
No equivalent; all if
expressions must have an else. |
|
|
| “Counting loops” implemented by recursion | for $i in 1 to n |
We are now in a position to make the subfiles that display the information about
each group of pigs. Using Xalan, we must add a namespace to the
xmlns:redirect="org.apache.xalan.xslt.extensions.Redirect"
to the root <xsl:stylesheet> element. The template
that makes the subfile follows. To save space, we do not show the code
that shows the next/previous page links at the bottom of each page.
<xsl:template name="makeSubfile">
<xsl:param name="start"/>
<xsl:param name="end"/>
<xsl:param name="filename"/>
<!-- calculate this once, for use in next/back links -->
<xsl:variable name="currentPage"
select="(($start - 1) div $perPage) + 1"/>
<redirect:write select="$filename">
<html>
<head>
<link rel="stylesheet" type="text/css" href="bdr.css" />
<title>Animals Page <xsl:value-of select="$currentPage"/>
</title>
</head>
<body>
<div align="center">
<h1>Animals <xsl:value-of select="$start"/> -
<xsl:value-of
select="$end"/></h1>
</div>
<table border="0">
<xsl:apply-templates select="self::animal |
following-sibling::animal[position()
< $perPage]" mode="display">
<xsl:with-param name="start" select="$start"/>
</xsl:apply-templates>
</table>
</body>
</html>
</redirect:write>
</xsl:template>
</redirect:write> will be output to the file named
in $filename.mode because
it is processing the
<animal> elements again, and the mode
tells XSLT which template to invoke.Now, the equivalent XQuery. Our strategy is to create the output
page in a variable, and then use Qizx/open’s
x:serialize extension function to direct it to a file.
declare function local:
make-subfile( $animalList as element()*,
$start as xs:integer,
$end as xs:integer,
$filename as xs:string) as xs:string
{
let
$currentPage := (($start - 1) div $perPage) + 1
let
$htmlPage :=
<html>
<head>
<link rel="stylesheet" type="text/css" href="bdr.css" />
<title>Animals Page {$currentPage}</title>
</head>
<body>
<div align="center">
<h1>Animals {$start} - {$end}</h1>
</div>
<table border="0">
{
local:display-animals(
$animalList[position() >= $start and position() <= $end],
$start
)
}
</table>
</body>
</html>
let
$outputFilename:=
x:serialize( $htmlPage,
<options output="animals{$currentPage}.html"
indent="yes"
omit-xml-declaration="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system=
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>)
return
(
""
)
};
let clauses rather
than a series of variables separated by commas; it makes the code clearer
to read.
makeSubfile
was called in the context of an <animal> element. XQuery
does not automatically pass the context on to called functions, which is why
we must pass the entire $animalList.
x:serialize function takes two arguments.
The first is an XML tree you want serialized. The second
is an element patterned along the lines of XSLT’s
<xsl:output> element.
In order to place a comment into XQuery, you enclose it in
smiley faces (: and :), which works
fine when you are in XQuery expression mode:
let $pi := 3.14159 (: just a quick approximation :)
Unfortunately, this doesn’t work well when you are in direct element constructor mode. The first of the three following examples will simply place text into the XML tree, smiley faces and all. Enclosing the comment in braces to enter XQuery expression mode gives a syntax error because an expression in braces must yield a value. The only way to get around this is to provide the null string as the value of the expression, as shown in the third example.
<a href="#">Main Page</a> (: activate link later :)
<a href="#">Main page</a> { (: activate link later :) }
<a href="#">Main page</a> { (: activate link later :) ""}
|
In order to retrieve the width and height of an image given its
file name, we will write a Xalan extension function in Java. It will be in
a class named XImageSize (X for Xalan).
This function,
named getDimensions, will take a file name as string input
and return an empty XML element
with attributes containing the
file name and the image’s width and height. The return
value “cleans up” the file name by removing
leading and trailing whitespace. The general model for
this element is:
<imageSize fileName="fileName"
width="width" height="height" />
In order to use this extension, we need to add some information to the XSL stylesheet. We need to establish a namespace for the extension and register that prefix as one belonging to an extension. We also want to make sure that this prefix never makes it into the output document.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
xmlns:img="info.evccit.utils"
extension-element-prefixes="img"
exclude-result-prefixes="img">
Once this is set up, the XSL stylesheet can
call the function and extract the width and height
directly from the returned <imageSize> element
as follows:
<xsl:template match="picture">
<xsl:variable name="dimensions"
select="img:ImageSize.getDimensions(string(file))"/>
<img src="{$dimensions/@fileName}"
width="{$dimensions/@width)}"
height="{$dimensions/@height}"
alt="{description}"
title="{caption}" />
</xsl:template>
In order to return an element, the function must have access to a
Document and its createElement() method.
We know that it is possible for an extension function to do this;
the tokenize()
extension in org.apache.xalan.lib.Extensions does it.
Because Xalan is Open Source, we can look at the code and copy it
wholesale into ours. We also need to put in the appropriate attribution
and include a copy of the
Apache License
information along with the source code.
/**
* This class is not loaded until first referenced
* (see Java Language Specification by Gosling/Joy/Steele,
* section 12.4.1)
*
* The static members are created when this class is
* first referenced, as a lazy initialization not needing
* checking against null or any synchronization.
*
* This function Copyright 1999-2004
* The Apache Software Foundation.
*/
private static class DocumentHolder
{
// Reuse the Document object to reduce memory usage.
private static final Document m_doc;
static {
try
{
m_doc =
DocumentBuilderFactory.newInstance().
newDocumentBuilder().newDocument();
}
catch(ParserConfigurationException pce)
{
throw new org.apache.xml.utils.
WrappedRuntimeException(pce);
}
}
}
This class will go into the main XImageSize class, which looks like this.
package info.evccit.utils;
public class XImageSize
{
static char fileSep;
static {
char[] carr =
System.getProperty("file.separator").toCharArray();
fileSep = carr[0];
}
public static Node getDimensions( String fileName )
{
Document doc = DocumentHolder.m_doc;
Element result = null;
fileName = fileName.trim();
try
{
Dimension d =
ImageFileDimensions.getFileDimensions( fileName );
result = doc.createElement("imageSize");
result.setAttribute( "fileName",
fileName.replace( fileSep, '/' ) );
result.setAttribute( "width",
Integer.toString((int) d.getWidth() ));
result.setAttribute( "height",
Integer.toString((int) d.getHeight() ));
}
catch (Exception e)
{
result = null;
}
return result;
}
}
ImageFileDimensions.getFileDimensions()
opens up the file and
reads the first few bytes to determine whether it is a gif, JPG, or GIF
file. Depending upon the file type, it does the appropriate
work to extract the width and height and returns it in a
Dimension object. The exact details aren’t relevant to this
article, so the code isn’t shown here.
The source XML file sets the base path for all the images with the
<image-base> element. Rather than do a complicated
normalize-space() and
concat() to join the base path to the image file name in the
XSLT, we create a second version of getDimensions() that accepts
two strings and does the heavy lifting:
public static Node
getDimensions( String pathName, String fileName )
{
String fileSeparator = System.getProperty("file.separator");
String combinedName;
pathName = pathName.trim();
fileName = fileName.trim();
if (pathName.endsWith( fileSeparator ))
{
combinedName = pathName + fileName;
}
else
{
combinedName = pathName + fileSeparator + fileName;
}
return getDimensions( combinedName );
}
If you download the code, you will see that we have heavily overloaded the
getDimensions()
function by allowing it to accept a Node or NodeList
for either or both parameters, but that isn’t the point of this
article. Onward to...
The code for this extension is almost identical to the Xalan extension.
Instead of returning an <imageSize> element, however,
we will return a vector of three items: the filename, the width, and
the height. Qizx/open will interpret this as a sequence of items.
The XQuery file must connect the class, which is named QImageSize,
with a namespace. This statement goes at the head of the XQuery file.
Note carefully! This assignment uses a single equal sign, not
the := used for a let clause. We will also have
to pass the class name to Qizx/open on the command line when we run the
query; this lets Qizx/open know that this is an authorized extension and
no security exception needs to be raised.
declare namespace imgsize =
"java:info.evccit.utils.QImageSize";
Once the namespace is established, XQuery can extract the information as part of the pig display code:
for $animal at $pos in $animalList
let
$basePath := $animal/../image-base,
$dimensions := imgsize:getDimensions($basePath,
$animal/picture/file/text() )
return
(
<img
src="{$dimensions[1]}"
width="{$dimensions[2]}"
height="{$dimensions[3]}"
alt="{$animal/picture/description/text()}"
title="{$animal/picture/caption/node()}"
hspace="4" />
)
Here’s the code for the function that takes the entire filename as one string parameter:
package info.evccit.utils;
import net.xfra.qizxopen.xquery.dm.Node;
import java.awt.Dimension;
import java.util.Vector;
public class QImageSize
{
static char fileSeparator;
static {
char[] carr =
System.getProperty("file.separator").toCharArray();
fileSeparator = carr[0];
}
public static Vector getDimensions( String fileName )
{
Vector result = new Vector(3);
fileName = fileName.trim();
try
{
Dimension d =
ImageFileDimensions.getFileDimensions( fileName );
result.add( fileName.replace( fileSeparator, '/' ) );
result.add( new Integer( (int) d.getWidth() ) );
result.add( new Integer( (int) d.getHeight() ) );
}
catch (Exception e)
{
result = null;
}
return result;
}
}
The two-string version of getDimensions() is exactly the
same as the Xalan version, except that it returns a Vector
instead of a Node. This function has also been heavily
overloaded to accept a Qizx/open Node (which is not
the same as a DOM node) so that the caller doesn’t have to dig down
to the text() step in the path.
You can download the sample pig rescue file, XSLT stylesheet,
XQuery file, extension
sources, and Apache License
here. The Java source
files are in the info directory, and the
API documentation is in the doc directory.
Make sure you put the ImageSize.jar file in your
classpath when invoking Xalan and/or Qizx/open.
The shell files
xcompile.sh and qcompile.sh will compile the
Xalan and Qizx/open extensions. Files make_javadoc.sh
and make_jar.sh create the Javadoc and ImageSize.jar files.
Files run_xalan.sh and run_qizx.sh run the
transformation and XQuery.
Thanks to Xavier Franc, author of Qizx/open, for his advice and information on using XQuery.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.