Menu

What Is XQuery

October 16, 2002

Per Bothner

The W3C is finalizing the XQuery specification, aiming for a final release in late 2002. XQuery is a powerful and convenient language designed for processing XML data. That means not only files in XML format, but also other data including databases whose structure -- nested, named trees with attributes -- is similar to XML.

XQuery is an interesting language with some unusual ideas. This article provides a high level view of XQuery, introducing the main ideas you should understand before you go deeper or actually try to use it.

An Expression Language

The first thing to note is that in XQuery everything is an expression which evaluates to a value. An XQuery program or script is a just an expression, together with some optional function and other definitions. So 3+4 is a complete, valid XQuery program which evaluates to the integer 7.

There are no side-effects or updates in the XQuery standard, though they will probably be added at a future date. The standard specifies the result value of an expression or program, but it does not specify how it is to be evaluated. An implementation has considerable freedom in how it evaluates an XQuery program, and what optimizations it does.

Here is a conditional expression that evaluates to a string:

if (3 < 4) then "yes!" else "no!"

You can define local variable definitions using a let-expression:

let $x := 5 let $y := 6 return 10*$x+$y

Primitive Data Types

The primitives data types in XQuery are the same as for XML Schema.

  • Numbers, including integers and floating-point numbers.
  • The boolean values true and false.
  • Strings of characters, for example: "Hello world!". These are immutable - i.e. you cannot modify a character in a string.
  • Various types to represent dates, times, and durations.
  • A few XML-related types. For example a QName is a pair of a local name (like template) and a URL, which is used to represent a tag name like xsl:template after it has been namespace-resolved.

Derived types are variations or restrictions of other types. Primitive types and the types derived from them are known as atomic types, because an atomic value does not contain other values. Thus a string is considered atomic because XQuery does not have character values.

Node Values and Expressions

XQuery also has the necessary data types needed to represent XML values. It does this using node values, of which there are 7 kinds: element, attribute, namespace, text, comment, processing-instruction, and document (root) nodes. These are very similar to the corresponding DOM classes such as Node, Element and so on. Some XQuery implementations use DOM objects to implement node values, though implementations may use other representations.

Various standard XQuery functions create or return nodes. The document function reads an XML file specified by a URL argument and returns a document root node. (The root element is a child of the root node.)

You can also create new node objects directly in the program. The most convenient way to do that is to use an element constructor expression, which looks just like regular XML data:

<p>See <a href="index.html"><i>here</i></a> for info.</p>

You can use {curly braces} to embed XQuery expression inside element constructors. Thus,

let $i := 2 return

let $r := <em>Value </em> return

  <p>{$r} of 10*{$i} is {10*$i}.</p>

creates

<p><em>Value </em> of 10*2 is 20.</p>

Popular template processors, like JSP, ASP, and PHP, allow you to embed expressions in a programming language into HTML content. XQuery gives you that ability, plus the ability to embed XML/HTML forms inside expressions, and to have them be the value of variables and parameters.

XQuery node values are immutable (you cannot modify a node after it has been created).

Sequences

We've seen atomic values (numbers, strings, etc), and node values (elements, attributes, etc). These are together known as simple values. XQuery expressions actually evaluate to sequences of simple values. The comma operator can be used to concatenate two values or sequences. For example,

3,4,5

is a sequence consisting of 3 integers. Note that a sequence containing just single value is the same as that value by itself. You cannot nest sequences. To illustrate this, we'll use the count function, which takes one argument and returns the number of values in that sequence. So the expression

let $a := 3,4

let $b := ($a, $a)

let $c := 99

let $d := ()

return (count($a), count($b), count($c), count($d))

evaluates to (2, 4, 1, 0) because $b is the same as (3,4,3,4).

Many of the standard functions for working with nodes return sequences. The children function returns a sequence of the child nodes of the argument. Thus,

children(<p>This is <em>very</em> cool.</p>)

returns this sequence of 3 values:

"This is ", <em>very</em>, " cool."

Path Expressions and Relationship to XPath

XQuery borrows path expressions from XPath. XQuery can be viewed as a generalization of XPath. Except for some obscure forms (mostly unusual "axis specifiers"), all XPath expressions are also XQuery expressions. For this reason the XPath specification is also being revised by the XQuery committee, with the plan that XQuery 1.0 and XPath 2.0 will be released about the same time.

The following simple example assumes an XML file "mybook.xml" whose root element is a <book>, containing some <chapter> children:

let $book := document("mybook.xml")/book

return $book/chapter

The document function returns the root node of a document. The /book expression selects the child elements of the root that are named book, so $book gets set to the single root element.

The $book/chapter selects the child elements of the top-level book elements, which results in a sequence of the second-level chapter nodes in document order.

The next example includes a predicate:

$book//para[@class="warning"]

The double slash is a convenience syntax to select all descendants (rather than just children) of $book, selecting only <para> element nodes that have an attribute node named class whose value is "warning"

One difference to note between XPath and XQuery is that XPath expressions may return a node set, whereas the same XQuery expression will return a node sequence. For compatibility these sequences will be in document order and with duplicates removed, which makes them equivalent to sets.

XSLT is very useful for expressing very simple transformations, but more complicated stylesheets (especially anything with non-trivial logic or programming) can often be written more concisely using XQuery.

Iterating Over Sequences

A for expression lets you "loop" over the elements of a sequence:

for $x in (1 to 3) return ($x,10+$x)

The for expression first evaluates the expression following the in. Then for each value of the resulting sequence, the variable (in this case $x) is bound to the value, and the return expression evaluated using that variable binding. The value of the entire for expression is the concatenation of all values of the return expression, in order. So the example evaluates to this 6-element sequence: 1,11,2,12,3,13.

Here is a more useful example. Assume again that mybook.xml is a <book> that contains some <chapter> elements. Each <chapter> has a <title>. The following will create a simple page that lists the titles:

<html>{

  let $book := document("mybook.xml")/book

  for $ch in $book/chapter

    return <h2>{$ch/title)</h2>

}</html>

The term "FLWR expressions" includes both for and let expressions. The acronym FLWR refers to the fact that it consists of one or more for and/or let clauses, an optional where clause, and a result clause. A where clause causes the result clause to be evaluated only when the where where expression is true.

The next example has a nested loop, allowing us to combine two sequences: one of customer elements and the other of order elements. We want to find the name(s) of customers who have ordered the part whose part_id is "xx".

for $c in customers

for $o in orders

where $c.cust_id=$o.cust_id and $o.part_id="xx"

return $c.name

This is essentially a join of two tables as commonly performed using relational databases. An important goal for XQuery is that it should be usable as a query language for XML databases. Compare the corresponding SQL statement,

select customers.name

from customers, orders

where customers.cust_id=orders.cust_id

  and orders.part_id="xx"

Functions

XQuery wouldn't be much of a programming language without user-defined functions. Such function definitions appear in the query prologue of an XQuery program. It's worth noting that function parameters and function results can be primitive values, nodes, or sequences of either.

The following is a recursive utility function. It returns all the descendant nodes of the argument, including the argument node itself. It does a depth-first traversal of the argument, returning the argument, and then looping over the argument node's children, recursively calling itself for each child.

define function descendant-or-self ($x)

{

  $x,

  for $y in children($x)

    return descendant-or-self($y)

}

descendant-or-self(<a>X<b>Y</b></a>)

Which evaluates to this sequence of length 4:

<a>X<b>Y</b></a>; "X"; <b>Y</b>; "Y"

Sorting and Context

If you want to sort a sequence you can use a sortby expression. To sort a sequence of books in order of author name you can do:

$books sortby (author/name)

The sortby takes an input sequence (in this case $books) and one or more ordering expressions. During sorting the implementation needs to compare two values from the input sequence to determine which comes first. It does that by evaluating the ordering expression(s) in the context of a value from the input sequence. So the path expression author/name is evaluated many times, each time relative to a different book as the context (or current) item.

Path expressions also use and set the context. In author/name the name children that are returned are those of the context item, which is an author item.

Type Specification

XQuery is a strongly typed programming language. Like Java and C#, for example, it's a mix of static typing (type consistency checked at compile-time) and dynamic typing (run-time type tests). However, the types in XQuery are different from the classes familiar from object-oriented programming. Instead, it has types to match XQuery's data model, and it allows you to import types form XML Schema.

if ($child instance of element section)

then process-section($child)

else ( ) {--nothing--}

This invokes the process-section function if the value of $child is an element whose tag name is section. XQuery has a convenient typeswitch shorthand for matching a value against a number of types. The following converts a set of tag names to a different set.

define function convert($x) {

  typeswitch ($x)

    case element para return <p>{process-children($x)}</p>

    case element emph  return <em>{process-children($x)}</em>

    default return process-children($x)

}

define function process-children($x) {

  for $ch in children($x) return convert($ch)

}

Resources

The primary XQuery resource is www.w3.org/XML/Query. This has links to the draft standards, mailing lists, and implementations. The main documents are

There's only one XQuery book so far, mainly because there are significant loose ends in the specification: Early Adopter XQuery from Wrox. I am co-authoring (with James McGovern, Kurt Cagle, James Linn and Vaidyanathan Nagarjan) XQuery Kick Start for Sams Publishing, due to be released in 2003. There are no complete standards-conforming implementations either, but the XQuery site lists known implementations, some of which have executable demos. The only open-source implementation currently available seems to be my Qexo. (The Qexo implementation is interesting in that it compiles XQuery programs on-the-fly directly to Java bytecodes.) I recommend considering XQuery when you need a powerful and convenient tool for analyzing or generating XML.