Menu

An XQuery Update

September 10, 2003

Per Bothner

The XQuery/XSLT working group released another set of Working Drafts on August 22, 2003. This article is my attempt to summarize the significant changes in the new drafts. Note that there is no new version of either the Data Model or Functions and Operators specifications, which were released to Last Call in May.

Some text and examples in this summary are quoted from the working drafts. The full drafts can be found a the W3C's XML Query home page.

Full Axis Feature

XQuery is mostly a superset of XPath 1, but one significant difference was in path expressions; XQuery left out support for some of the less common axes: ancestor, ancestor-or-self, following, following-sibling, preceding, preceding-sibling, and namespace.

The newest draft still leaves out the namespace axis. The argument against namespace is that it complicates the implementation of nodes, and it may be difficult to avoid overhead even for XQuery code that does not use the namespace axis. This is a high cost for a rarely-used feature, so it is not included in XQuery. It's deprecated and optional in XPath 2.0. The standard functions fn:get-namespace-uri-for-prefix and fn:get-in-scope-namespaces provide alternatives to the namespace axis.

The arguments against the other axes are not as strong. One argument is that they are redundant, since they can be expressed using other axes. For example,

following-sibling::NodeTest

is equivalent to

let $e := . in parent::node()/child::NodeTest[.<<$e]

In fact the XQuery/XPath formal semantics defines following-sibling this way. But these alternative formulations are both inconvenient for users and harder for an implementation to optimize, which suggests the these axes should be standard. On the other hand, the other axes may be very inefficient in some reasonable implementations, and programmers may not understand this if the axes are standard. (See Issue 114.)

So the new draft makes ancestor, ancestor-or-self, following, following-sibling, preceding, and preceding-sibling optional. It ties them to the Full Axis Feature. An implementation is free to implement the Full Axis Feature, in which case it must implement all these extra axes.

Node Constructors

There are new computed constructors for processing instructions, comments, and spaces. In earlier drafts you could write an XML comment directly:

<!-- This is an XML comment.-->

This is convenient, but doesn't allow you to calculate the exact comment text at runtime: the comment is an atomic value. With a computed element constructor you can calculate the text using an expression:

let $r := "XQuery" return (
  comment {"The next section relates to", $r},
  element section { whatever() } )

An alternative approach would be to allow enclosed expressions in direct comment constructors, as in <!--The next section relates to {$r}-->. I'm not sure why the committee didn't go this route. Perhaps no one suggested it. Perhaps they felt it was more consistent to have a "complete" set of computed constructors.

There are also new computed processing instruction constructors:

let $target := "audio-output",
    $content := "beep" return
    pi {$target} {$content}

This is equivalent to

<?audio-output beep?>

This example uses a computed namespace constructor:

let $nsURI := "http://example.org/metric-system",
    $attrname := "metric:unit",
    $attrvalue := "meter" 
    return
       element {"altitude"} {
          namespace metric {$nsURI},
          attribute {$attrname} {$attrvalue},
          "10000"
    }

This is equivalent to

<altitude
   xmlns:metric = "http://example.org/metric-system"
   metric:unit = "meter">10000</altitude>

The new section 2.7.4 Namespace Nodes on Constructed Elements describes how namespace nodes are created for the result of element constructors. But note that since there is no namespace axis in XQuery, there is no way you can actually observe an element node. So what this section really specifies is what namespace prefixes may be used when elements are written in text form (serialized) and in the fn:name function.

The base URI of a constructed element node, as well as copied descendant nodes, are taken from the static context, even if the original nodes have some other base URI.

Query Prolog and Modules

Each declaration in the module prolog must now be followed by a semicolon. Default namespace declarations now require the keyword declare, in addition to default, to be consistent with other declarations. Thus, you can write

declare default element namespace "http://example.org/names";

rather than

default element namespace "http://example.org/names" (: OLD :)

Similarly, define function has become declare function and define variable becomes declare variable. Also a validation declaration must start with declare, and default collation = "namespace" becomes declare default collation "namespace". An xmlspace declaration no longer includes the = token:

declare xmlspace preserve;

There is a new Base URI declaration, which is used when resolving relative URIs in the module:

declare base-uri "http://example.org";

The standard fn:doc resolves a relative URI using the base URI of the calling module. This means that the function call fn:doc($uri) isn't a function call in the C programmer's sense, but it more like a "macro invocation" since it it depends on the current module's static base-uri. In other words, it really means

fn:doc(fn:resolve-uri($uri, "http://example.org"))

There is a new pre-defined namespace local bound to http://www.w3.org/2003/08/xquery-local-functions.

One major change is that the qname of the function being defined in a function definition must have an explicit namespace prefix. You can use the predefined local prefix, but only in main modules.

The syntax of the module declaration at the start of a library module has changed from

module "http://example.org/math-functions"

to

module math = "http://example.org/math-functions";

Both variables and functions declared in a library module must be explicitly qualified by the target namespace prefix of the module. So following the above declaration, you could write

declare function math:acos ($x as xs:double) as xs:double external;
declare variable $math:PI as xs:double := math:acos(-1);
O'Reilly Emerging Technology Conference.

Errors and error codes

For each error that an implementation is required to detect, there is now a numeric error code. For example,

err:XP0020
It is a type error if in an axis expression, the context item is not a node.

These are listed in Appendix F. It is not clear how these are supposed to be used. (See Issue 340.)

The May draft said:

If an implementation can determine by static analysis that an expression will necessarily raise a dynamic error...the implementation is allowed to report this error during the analysis phase

The August draft restricts this to the case of constant folding:

If any expression (at any level) can be evaluated during the analysis phase (because all its explicit operands are known and it has no dependencies on the dynamic context), then any error in performing this evaluation may be reported as a static error.

Presentation changes

Some of the changes don't change the XQuery language itself, but clarify or improve the documents. Many terms now have explicitly defined, and summarized in a new Glossary section. For example, section 2.5 now defines the terms "static error", "dynamic error", "type error", and "error value".

Section 2 has been reorganized and a new subsection "Processing Model" has been introduced. This is useful reading for understanding how XQuery works, though it uses a lot of terms and concepts. A new Appendix "Context Components" summarizes how the static and dynamic context are initialized.

Other smaller changes

A / or // at the start of a path expression sets the context to the root of the original context node. In the new draft, there is a cast (using treat) to force the root to be a document node; otherwise, an error is raised: for example, if the context node is a standalone element node.

In the treat expression X treat as T, there is no longer a requirement that the static type of X be "derived by restriction" from T.

The input() function has been deleted. I assume the reason is that a variable declaration with an external value provides similar functionality, without the extra concept of an implementation-defined input sequence.

There are also various minor changes to the grammar or the formal semantics. For example, the context item expression "." is now classified as a Primary Expression rather than as an Abbreviated Forward Step.