XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

What's New in XPath 2.0
by Evan Lenz | Pages: 1, 2

Cardinal rule #4: Sequences may contain duplicates.

Also unlike XPath 1.0 node-sets (and sets in general), sequences may contain duplicates. For example, we can modify our expression above slightly:

(/foo/bar, /foo, /foo/bar)

This sequence consists of the bar element(s), followed by the foo element(s), followed again by the same bar element(s). In XPath 1.0, it was impossible to construct such a collection, because, by definition, node-sets may not contain the same node more than once.

The rise and fall of the node-set

In XPath 1.0, if you wanted to process a collection of nodes, you had to deal with node-sets. In XPath 2.0, the concept of the node-set has been generalized and extended. As we've seen, sequences may contain simple-typed values as well as nodes. We've also seen that sequences differ from node-sets in that they are ordered and may contain duplicates. The question naturally arises: how can you do away with node-sets without breaking XPath?

How to emulate sets in a sequence-only world

Indeed, XPath 1.0 node-sets were unordered. However, in XPath's most common context, XSLT, the nodes within the node-set are always processed in some order. The default order used to process node-sets was document order (since there is a document order that is always defined for all nodes). In XSLT 2.0, the default order used to process a node collection (i.e. sequence) is not necessarily document order but, rather, the order of the sequence. To maintain backward compatibility with XPath 1.0, path expressions (and other 1.0 expressions such as union expressions) are defined to always return in document order. Specifically, whenever "/" is used in the immediate expression, you can expect the result to be in document order. In addition, duplicates are automatically removed from the result. XPath 2.0 is thus able to emulate node-sets in a sequence-only world.

If you didn't follow all of that, don't worry. You may not have even realized before now that XPath 1.0 node-sets were unordered. It's mostly for the benefit of specification writers who like to reassure ourselves that everything is consistent and well-defined. Just rest assured that sequences are in fact ordered and path expressions pretty much behave the way they used to.

Some good keywords to learn

In addition to introducing many new datatypes and functions, XPath 2.0 introduces a number of new keyword-based operators, some of which we'll look at below.

Operations on sequences

Perhaps the most powerful new operator in XPath 2.0 for processing sequences is the for expression. It enables iteration over sequences, returning a new value for each member in the argument sequence. This is similar to what can be done with xsl:for-each, but it is different in that it is an actual expression that returns a sequence which can, in turn, be processed as such.

Consider the following example, which returns a sequence of simple-typed values, each consisting of the total cost of each item in a purchase order.

for $x in /order/item return $x/price * $x/quantity

We could then get the total cost of the order by using the sum() function.

sum(for $x in /order/item return $x/price * $x/quantity)

Cases such as these are much easier to solve using sequences in XPath 2.0 than they were in XSLT/XPath 1.0. Without sequences, this problem is much harder to solve and usually involves constructing a temporary "result tree fragment" and then using the node-set() extension function.

Conditional expressions

Among the more powerful (and oft-requested) constructs added to XPath 2.0 is the conditional expression. Here's an example that's included in the XPath 2.0 working draft.

if ($widget1/unit-cost < $widget2/unit-cost) 
  then $widget1
  else $widget2 

Quantifiers

The XPath 1.0 equals operator (=) was one of the more powerful aspects of the language. It was powerful because it could compare node-sets. Consider the following expression.

/students/student/name = "Fred"

In XPath 1.0, this expression returns true if any student name is equal to "Fred". This might be called existential quantification because it tests for the existence of a member satisfying some condition. XPath 2.0 preserves this functionality but also provides a more explicit way of testing:

some $x in /students/student/name satisfies $x = "Fred"

This formulation is more powerful because you can replace the $x = "Fred" with any comparison you want, not just equality comparisons. Also, XPath 1.0 does not provide a way for testing to see if every student is named "Fred". XPath 2.0 introduces this ability to do universal quantification, using a similar syntax to the above.

every $x in /students/student/name satisfies $x = "Fred"

Intersections, differences, unions

In XPath 1.0, the only real set operator was the union operator (|). This meant that it was very awkward to determine whether a given node was in a given node-set. For example, to determine whether the node $x is included in the /foo/bar node-set, we'd have to write something like

/foo/bar[generate-id(.)=generate-id($x)]

or like

count(/foo/bar)=count(/foo/bar | $x)

XPath 2.0's introduction of the intersect operator alleviates some of the pain. Instead of going through the above gyrations, we can simply write

$x intersect /foo/bar

XPath 2.0 also introduces the except operator, which can be very handy when we need to select all of a given node-set, except for certain nodes. In XPath 1.0, if we wanted to, for example, select all attributes except for the one with a given namespace-qualified name, we'd have to write

@*[not(namespace-uri()='http://example.com' and local-name()='foo')]

or

@*[not(generate-id(.)=generate-id(../@exc:foo)]

Once again, XPath 2.0 comes to our rescue with the following pleasant alternative:

@* except @exc:foo

Worrying about data types

If you take a peek at the XPath 2.0 spec, you'll see that I've left out a lot of keywords, including things like cast, treat, assert, and instance of. These are important parts of the language, but their importance partially depends on which context you're using XPath 2.0 in. If you will be using XPath in the context of XSLT 2.0, you may not need to use these every day. You certainly will want to use them in certain cases (for example, when casting a string to a date), but you won't be required to use them. In the context of XQuery 1.0, however, you may need to become intimately familiar with them.

The reason is that XQuery 1.0 is designed to be a statically typed language. Query analysis and optimization are aided by knowledge about what datatypes query expressions will be returning before the query is ever executed. This is only possible if the user explicitly specifies what type each of her expressions are to return. The other advantage of this approach is that errors can be caught early, thereby helping to enforce the correctness of queries.

There is certainly a tradeoff between usability and type safety. To serve the needs of both communities (sometimes artificially divided into the document-oriented and data-oriented worlds), XPath 2.0 provides a means by which the context can decide where it would like to stand in this tradeoff. Effectively, XPath 2.0 can be parameterized by its context. This may sound like a recipe for non-interoperability. However, it is important to identify the guiding principle behind the approach that has been taken. The principle is that any XPath 2.0 expression that does not first return an error will always return the same result as in another context. Thus, while an expression in one context may produce an error and not in another, it will never produce two different expression results. In other words, you always get either a right answer or an error. There is never more than one right answer.

The intended upshot for XSLT users is that they won't have to worry about most of this stuff, most of the time. A given XPath 2.0 expression may throw an "exception" in the XQuery context, but the same expression results in a silently invoked fallback conversion when in the context of XSLT.

Conclusion

It will likely become clear that XPath 2.0 represents a very significant upgrade to XPath 1.0. Its growth has been driven both by the demands of the XPath 1.0 user community, as well as the requirements for XQuery 1.0. Even if you don't agree with the entire outcome, it's hard to deny that it represents a remarkable collaboration. With any luck, it will also represent a very powerful, standard tool for several user communities.



1 to 10 of 10

  1. 2010-07-07 02:00:55 bathroom furniture

  2. 2010-07-06 03:56:37 bathroom furniture
  3. string-to-codepoints
    2005-07-31 04:05:34 wshubeir
  4. Data Mapping with XPath
    2003-08-04 07:19:59 Faisal Azhar
  5. XPath 1.0 *can* do universals, no?
    2003-06-19 13:47:05 Lars Huttar
  6. Why losing time in XPath 2.0?
    2002-12-31 06:54:17 Sandro camillo
  7. Next article?
    2002-07-14 07:10:06 Dave Pawson
  8. Is this better?
    2002-04-15 12:59:58 Ian Ornstein
  9. XPath and XSLT 2.0 support
    2002-04-11 20:08:21 Paul Strand
  10. getting attribute names of an XML element
    2002-04-07 16:09:41 Morgan Nagarajan
1 to 10 of 10