What's New in XPath 2.0
by Evan Lenz
|
Pages: 1, 2
Cardinal rule #4: Sequences may contain duplicates.
Also unlike XPath 1.0 node-sets (and sets in general), sequences may contain duplicates. For example, we can modify our expression above slightly:
(/foo/bar, /foo, /foo/bar)
This sequence consists of the bar element(s), followed
by the foo element(s), followed again by the same
bar element(s). In XPath 1.0, it was impossible to
construct such a collection, because, by definition, node-sets may not
contain the same node more than once.
The rise and fall of the node-set
In XPath 1.0, if you wanted to process a collection of nodes, you had to deal with node-sets. In XPath 2.0, the concept of the node-set has been generalized and extended. As we've seen, sequences may contain simple-typed values as well as nodes. We've also seen that sequences differ from node-sets in that they are ordered and may contain duplicates. The question naturally arises: how can you do away with node-sets without breaking XPath?
How to emulate sets in a sequence-only world
Indeed, XPath 1.0 node-sets were unordered. However, in XPath's most common context, XSLT, the nodes within the node-set are always processed in some order. The default order used to process node-sets was document order (since there is a document order that is always defined for all nodes). In XSLT 2.0, the default order used to process a node collection (i.e. sequence) is not necessarily document order but, rather, the order of the sequence. To maintain backward compatibility with XPath 1.0, path expressions (and other 1.0 expressions such as union expressions) are defined to always return in document order. Specifically, whenever "/" is used in the immediate expression, you can expect the result to be in document order. In addition, duplicates are automatically removed from the result. XPath 2.0 is thus able to emulate node-sets in a sequence-only world.
If you didn't follow all of that, don't worry. You may not have even realized before now that XPath 1.0 node-sets were unordered. It's mostly for the benefit of specification writers who like to reassure ourselves that everything is consistent and well-defined. Just rest assured that sequences are in fact ordered and path expressions pretty much behave the way they used to.
Some good keywords to learn
In addition to introducing many new datatypes and functions, XPath 2.0 introduces a number of new keyword-based operators, some of which we'll look at below.
Operations on sequences
Perhaps the most powerful new operator in XPath 2.0 for processing
sequences is the for expression. It enables iteration
over sequences, returning a new value for each member in the argument
sequence. This is similar to what can be done with
xsl:for-each, but it is different in that it is an actual
expression that returns a sequence which can, in turn, be processed as
such.
Consider the following example, which returns a sequence of simple-typed values, each consisting of the total cost of each item in a purchase order.
for $x in /order/item return $x/price * $x/quantity
We could then get the total cost of the order by using the
sum() function.
sum(for $x in /order/item return $x/price * $x/quantity)
Cases such as these are much easier to solve using sequences in
XPath 2.0 than they were in XSLT/XPath 1.0. Without sequences, this
problem is much harder to solve and usually involves constructing a
temporary "result tree fragment" and then using the
node-set() extension function.
Conditional expressions
Among the more powerful (and oft-requested) constructs added to XPath 2.0 is the conditional expression. Here's an example that's included in the XPath 2.0 working draft.
if ($widget1/unit-cost < $widget2/unit-cost) then $widget1 else $widget2
Quantifiers
The XPath 1.0 equals operator (=) was one of the more
powerful aspects of the language. It was powerful because it could
compare node-sets. Consider the following expression.
/students/student/name = "Fred"
In XPath 1.0, this expression returns true if any student name is equal to "Fred". This might be called existential quantification because it tests for the existence of a member satisfying some condition. XPath 2.0 preserves this functionality but also provides a more explicit way of testing:
some $x in /students/student/name satisfies $x = "Fred"
This formulation is more powerful because you can replace the
$x = "Fred" with any comparison you want, not just
equality comparisons. Also, XPath 1.0 does not provide a way for
testing to see if every student is named "Fred". XPath 2.0
introduces this ability to do universal quantification, using a
similar syntax to the above.
every $x in /students/student/name satisfies $x = "Fred"
Intersections, differences, unions
In XPath 1.0, the only real set operator was the union operator
(|). This meant that it was very awkward to determine
whether a given node was in a given node-set. For example, to
determine whether the node $x is included in the
/foo/bar node-set, we'd have to write something like
/foo/bar[generate-id(.)=generate-id($x)]
or like
count(/foo/bar)=count(/foo/bar | $x)
XPath 2.0's introduction of the intersect operator
alleviates some of the pain. Instead of going through the above
gyrations, we can simply write
$x intersect /foo/bar
XPath 2.0 also introduces the except operator, which
can be very handy when we need to select all of a given node-set,
except for certain nodes. In XPath 1.0, if we wanted to, for example,
select all attributes except for the one with a given
namespace-qualified name, we'd have to write
@*[not(namespace-uri()='http://example.com' and local-name()='foo')]
or
@*[not(generate-id(.)=generate-id(../@exc:foo)]
Once again, XPath 2.0 comes to our rescue with the following pleasant alternative:
@* except @exc:foo
Worrying about data types
If you take a peek at the XPath 2.0 spec, you'll see that I've left
out a lot of keywords, including things like cast,
treat, assert, and instance
of. These are important parts of the language, but their
importance partially depends on which context you're using XPath 2.0
in. If you will be using XPath in the context of XSLT 2.0, you may not
need to use these every day. You certainly will want to use them in
certain cases (for example, when casting a string to a date), but you
won't be required to use them. In the context of XQuery 1.0, however,
you may need to become intimately familiar with them.
The reason is that XQuery 1.0 is designed to be a statically typed language. Query analysis and optimization are aided by knowledge about what datatypes query expressions will be returning before the query is ever executed. This is only possible if the user explicitly specifies what type each of her expressions are to return. The other advantage of this approach is that errors can be caught early, thereby helping to enforce the correctness of queries.
There is certainly a tradeoff between usability and type safety. To serve the needs of both communities (sometimes artificially divided into the document-oriented and data-oriented worlds), XPath 2.0 provides a means by which the context can decide where it would like to stand in this tradeoff. Effectively, XPath 2.0 can be parameterized by its context. This may sound like a recipe for non-interoperability. However, it is important to identify the guiding principle behind the approach that has been taken. The principle is that any XPath 2.0 expression that does not first return an error will always return the same result as in another context. Thus, while an expression in one context may produce an error and not in another, it will never produce two different expression results. In other words, you always get either a right answer or an error. There is never more than one right answer.
The intended upshot for XSLT users is that they won't have to worry about most of this stuff, most of the time. A given XPath 2.0 expression may throw an "exception" in the XQuery context, but the same expression results in a silently invoked fallback conversion when in the context of XSLT.
Conclusion
It will likely become clear that XPath 2.0 represents a very significant upgrade to XPath 1.0. Its growth has been driven both by the demands of the XPath 1.0 user community, as well as the requirements for XQuery 1.0. Even if you don't agree with the entire outcome, it's hard to deny that it represents a remarkable collaboration. With any luck, it will also represent a very powerful, standard tool for several user communities.
2010-07-07 02:00:55 bathroom furniture
2010-07-06 03:56:37 bathroom furniture- string-to-codepoints
2005-07-31 04:05:34 wshubeir - Data Mapping with XPath
2003-08-04 07:19:59 Faisal Azhar - XPath 1.0 *can* do universals, no?
2003-06-19 13:47:05 Lars Huttar - Why losing time in XPath 2.0?
2002-12-31 06:54:17 Sandro camillo - Next article?
2002-07-14 07:10:06 Dave Pawson - Is this better?
2002-04-15 12:59:58 Ian Ornstein - Is this better?
2003-06-19 13:54:42 Lars Huttar - XPath and XSLT 2.0 support
2002-04-11 20:08:21 Paul Strand - XPath and XSLT 2.0 support
2002-04-12 02:01:34 Anthony Coates - getting attribute names of an XML element
2002-04-07 16:09:41 Morgan Nagarajan - getting attribute names of an XML element
2002-04-16 10:11:23 Bernhard Zwischenbrugger - getting attribute names of an XML element
2010-07-05 12:55:37 Dissertation Services