XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


What's New in XPath 2.0

March 20, 2002

This article provides a brief tour through some of the new features in XPath 2.0. It assumes that you already have a basic understanding of XPath 1.0, and that you've most likely used it in the context of XSLT. It is by no means an exhaustive overview but merely points out some of the most noteworthy features.

Relationship between XPath 1.0 and XPath 2.0

Both the XPath 1.0 recommendation and the latest XPath 2.0 working draft say that "XPath is a language for addressing parts of an XML document". This was a fairly appropriate characterization of XPath 1.0. (Of course, it doesn't mention that you can have arithmetic expressions and string, number, and boolean expressions, but those features were kept to a minimum.) On the other hand, as a characterization of XPath 2.0, it leaves a lot to be desired. XPath 2.0 is a much more powerful language that operates on a much larger domain of data types. A better way of describing XPath 2.0 is as an expression language for processing sequences, with built-in support for querying XML documents. Querying? Isn't that XQuery's job?

Relationship between XPath 2.0 and XQuery 1.0

For over a year now, the W3C XSL and XML Query Working Groups have been working closely together. The goal has been to share as much between XSLT 2.0 and XQuery 1.0 as is technically and politically feasible and to give that common subset the name "XPath 2.0". This effectively means that the driving forces behind XPath 2.0 include not only the XPath 2.0 Requirements document but also many of the XML Query language requirements.

XPath 2.0 is a strict syntactic subset of XQuery 1.0. In fact, both working drafts and language grammars were automatically generated from a common source (using XML and XSLT, of course). While it is not strictly true that one working draft is a subset of the other (because some paragraphs were devoted exclusively to XPath 2.0), it is nearly true. In any case, about 80% of the text in the XQuery draft is common to both drafts. As subsets go, the XPath 2.0 subset of XQuery 1.0 is a rather large one. The optimistic upshot is that, once you've gone through the trouble of learning XPath 2.0, you'll be pleased to discover that you're almost done learning XQuery.

In fact, the XQuery-specific features consist mostly of top-level query wrapper mechanisms, such as function definitions, namespace declarations, and schema imports, as well as element constructors. XSLT 2.0, the other primary context in which XPath 2.0 is meant to be used, doesn't need these mechanisms, since it generally provides its own versions, so they are not included in the common subset.

XML Schema support

If you recall, XPath 1.0 supported only four expression types:

  • node-set
  • boolean
  • number (floating-point)
  • string

This had the value of being simple but the disadvantage of being limited when it came to processing typed values, such as dates. XPath 2.0, on the other hand, introduces support for the XML Schema primitive types, which immediately gives the user access to 19 simple types, including dates, years, months, URIs, etc. In addition, a number of functions and operators are provided for processing and constructing these different data types. These are found in the "XQuery 1.0 and XPath 2.0 Functions and Operators" document.

For exhaustive coverage of what kinds of values XPath 2.0 expressions can return, see the "XQuery 1.0 and XPath 2.0 Data Model" document. For our purposes, it suffices to say that expressions can return simple-typed values, nodes, or sequences of nodes or simple-typed values. Actually, every expression returns a sequence, as we will see.

Nodes in XPath 2.0 have the same basic definition as in XPath 1.0, except that certain kinds (elements and attributes) can now be associated with XML Schema types and processed as such. As in XPath 1.0, there are seven node types: document nodes, elements, attributes, namespace nodes, processing instructions, comments, and text nodes. The one difference in terminology here is that "root nodes" are now called, perhaps more appropriately, "document nodes".

Sequences, sequences, sequences

As a language for processing sequences. it makes sense to talk about what XPath 2.0 thinks a sequence is and how it behaves. What follows is a set of rules that you should keep in mind. These cardinal truths about sequences are fundamental to the way that XPath 2.0 works. An understanding of them is a prerequisite to a deeper understanding and appreciation of the ways in which XPath 2.0 can be used.

Cardinal rule #1: Everything is a sequence.

Comment on this article Got questions about XPath 2.0? Ask them in our forum.
Post your comments

If you want to impress your friends immediately, point to any given XPath 2.0 expression (or XQuery expression for that matter) and casually observe that the expression clearly returns a sequence. You don't have to spoil the fun by letting them in on the secret that all expressions in fact return sequences.

If you think about the fact that everything is a sequence, you'll realize that there is no way to make a distinction between a simple-typed value (or node) and a sequence of one simple-typed value (or node). For that reason, the XPath 2.0 Working Draft and colloquial usage in general often speak of an expression as returning a "decimal" or a "string", when in fact what is meant is "a sequence of one decimal value" or a "sequence of one string". Since there is no distinction between these two, both usages are acceptable. Just remember that it's still true that everything is a sequence.

Cardinal rule #2: Sequences are shallow.

You cannot have a sequence of sequences. If you try to nest a sequence within a sequence, which is quite possible syntactically, you'll get a "flattened" sequence with the members of the sub-sequence included alongside the members of the containing sequence.

For example, this expression,

(2, 4, (1,2,3), 6)

evaluates to exactly the same sequence as this expression,

(2, 4, 1, 2, 3, 6)

or this expression, for that matter,

( (((2))), (4,1,2,3,(6)) )

Cardinal rule #3: Sequences are ordered.

Unlike node-sets in XPath 1.0, sequences are ordered. Consider the following expression.

(/foo/bar, /foo)

As you may have gathered, the comma (,) is an operator for constructing (concatenating) sequences. By putting /foo after /foo/bar, I construct a sequence in which the bar elements come before the foo elements, regardless of the order in which they occur in the source document. Later, we will see how XPath 2.0 sequences are able to replace XPath 1.0 node-sets, without loss of functionality or compatibility.

Pages: 1, 2

Next Pagearrow