Top Ten Tips to Using XPath and XPointer
John Simpson is the author of XPath and XPointer
XPath and XPointer allow XML developers and document authors to find and manipulate specific needles of content in an XML document's haystack. From mindful use of predicates, to processor efficiency, to exploring both the standards themselves and extensions to them, this article offers ten tips -- techniques and gotchas -- to bear in mind as you use XPath and XPointer in your own work.
Consider a simple document, such as the following:
<year> <month monthnum="4">April</month> </year>
Ask yourself how many children does the
If you think the answer is one, the
month element, you're
wrong. Answering the question with XPath might look something like
If you submit this XPath expression to a "pure" XPath-aware processor,
such as the one built into the Saxon XSLT engine, you're told that
year has three children. What's going on? The first bit of
content that follows the opening of the
year element isn't
month element, although it looks that way to the human
eye. Rather, the first bit of content within the
is a text node (a number of blank spaces, a newline, and a few more blank
spaces), which precede the opening of the
element. There's also a child text node (a simple newline) following the
month element's close, just before the close of the
year element. That is, to an XPath-aware processor, this
document resembles Figure 1.
|Figure 1: An XPath processor's-eye view of a document with "invisible" whitespace|
What do I mean by a "pure" XPath-aware processor? The one to look out for is the MSXML processor, freely provided by Microsoft both for use as a stand-alone product and embedded in Internet Explorer (MSIE). When you use the preceding XPath expressions to view the document in MSIE, via an XSLT transformation, you get the mistaken (albeit, perhaps, more common sense) answer: one child.
MSXML includes an XML parser, an XPath-compliant XSLT
engine, and an interface to the outside world (like MSIE). The parser
and the XSLT engine are both excellent, standards-compliant
components. It's the latter which produces the seemingly non-conformant
behavior when dealing with whitespace-only text nodes. This behavior is
controlled by a property,
preserveWhiteSpace, with true or
false values. The default is false, which causes MSIE to display the
document incorrectly. In order to change this behavior -- and make MSIE
behave "purely" -- you must use scripting to set
preserveWhiteSpace to true explicitly.
The predicate is such a powerful, valuable piece of a location step's real estate that you may be reluctant to try anything beyond the simplest ones. Don't be. The predicate is there to enable you to grab exactly the node(s) you need from among all those candidates visible along a given axis from the context node. Why limit yourself to selecting, say, just the nth child or elements with a particular attribute? Stretch your wings by experimenting with multiple predicates in a given location step or path.
Here's a simple XML document:
<roofers> <roofing-material> <manufacturer>Salitieri</manufacturer> <type>thatch</type> </roofing-material> <roofing-material> <manufacturer>Nash</manufacturer> <type>shingles</type> </roofing-material> <roofing-material> <manufacturer>DeBrutus</manufacturer> <type>fiberglass</type> </roofing-material> <roofing-material> <manufacturer>Short</manufacturer> <type>shingles</type> </roofing-material> </roofers>
Suppose you want to locate a nodeset consisting of any
type is "shingles", but only if the
manufacturer is "Nash". Either of the following approaches
works (pay special attention to the predicates):
//roofing-material[type="shingles"][manufacturer="Nash"] //roofing-material[type="shingles" and manufacturer="Nash"] //roofing-material[type[preceding-sibling::manufacturer="Nash"]="shingles"]
While the results of these three approaches are identical for this document, in other documents they might be quite different. And all three -- including that weird-looking "nested predicate" in the third example -- are perfectly legal XPath 1.0.
Beware of one trap when using the approach employed in the first of the three preceding location paths -- I think of it as a "stacked" predicate. The order in which predicates appear on the stack can affect the final result. Each succeeding predicate is evaluated in terms of the narrowed context provided by the preceding one(s), and not just in terms of the general context in which a single predicate would be evaluated. Here's a sample document to illustrate this point.
<tosses> <toss result="heads"/> <toss result="heads"/> <toss result="tails"/> <toss result="heads"/> </tosses>
Now consider the following two location paths into this document, each using a stacked predicate:
See the difference? The first path locates (a) all
result attribute equals "heads", and then (b) the
third one of those
toss elements. Therefore, it selects the
toss element in the document.
The second path selects the third
element, and then the stacked predicate applies a further screen,
selecting the third
toss element only if its
result attribute has a value of "heads". Because the third
result attribute is "tails",
therefore, this location path returns an empty node-set.
Consider another simple document,
<quotation> <source> <author>Firesign Theatre</author> <work year="1970"> Don't Crush that Dwarf, Hand Me The Pliers </work> </source> <text> And there's hamburger all over the highway in Mystic, Connecticut. </text> </quotation>
Ignoring the "invisible" whitespace (as discussed in tip #1), what's the
string-value of the
quotation element? In XPath terms,
we're seeking the string-value of the element node identified by this
Like the string-value of any other element, it's the concatenated values of all text nodes in the element's scope -- that is:
Firesign TheatreDon't Crush that Dwarf, Hand Me The PliersAnd there's hamburger all over the highway in Mystic, Connecticut.
That old devil, common sense, might lead you to conclude that the following XPath location path has the exact same string-value:
Run this path through an XPath processor, though, and what you get is simply
Firesign TheatreDon't Crush that Dwarf, Hand Me The Pliers
The behavior here is summed up in the rule of thumb, "The
string-value of any node-set is the string-value of only the
first node in the set." Thus, the second location path returns a
node-set consisting of two nodes, the
text elements. And the first of these,
is the only one whose string-value counts as the string-value of the
So what about the first case? Does that imply an exception to the general rule of thumb, an exception for root elements? No. The first location path selects a node-set which just happens to consist of a single node, and that node, of course, is thus the "first" node in the node-set. Both location paths obey the general rule.
This one can be very confounding if you're not careful. Take a look at these two sample location paths:
//employee[@id != "emp1002"] //employee[not(@id = "emp1002")]
The first example selects each
employee element node which has
id attribute whose value does not equal "emp1002" --
note that this excludes those with no
id attribute at
all. The second selects all
employee element nodes which do
not have an
id attribute whose value is "emp1002". So
assume, then, a document with an
employee element such as
The first location path will not select this
employee element: since it has no
at all, it does not have an id attribute whose value is not
emp1002 (or anything else, for that matter). The second, on
the other hand, will select this
employee element: it
id attribute whose value is not
After a while, you may become so self-assured when slinging XPath that you never need to test the results: you'll instinctively know the effect of one expression versus slightly different ones. Still, it's hard to imagine you'll be prepared for absolutely every eventuality, every nuance in a given document's node tree, every wrinkle in the XPath spec. At such a time, you absolutely must have a good tool to reassure you that you're on the right, well, path.
One common class of XPath testing tools, naturally, is comprised of all the production-grade XSLT processors: Saxon, Xalan, MSXML, and so on. In order to interpret and act on the template rules and other code in XSLT stylesheets, these processors must first "know" XPath. If you've coded your location paths properly, the transformation cranked through the processor will work properly; if you haven't, it won't -- properly or maybe at all.
This is kind of indirect confirmation, though. Wouldn't it be more useful to have a tool which, say, "lit up" the selected node(s) for a given XPath expression, especially if you're using XPath in a non-XSLT context like XPointer or XQuery?
My own work takes place mostly on Microsoft Windows platforms. Even if yours doesn't, you probably have access to a Windows PC. If you do, you're in luck: there's a great tool written by Dimitre Novachev, "XPath Visualiser". You can obtain it from the Top XML (formerly VBXML) site.
XPath Visualiser isn't a standalone software package. Instead, it's a frameset HTML document which you open in MSIE. The top frame includes some form elements, such as fields for entering the name of the document you want to view and the XPath expression you want to test, and a "Select Nodes" button to highlight all nodes selected by the XPath expression you've entered. The default expression, as shown in Figure 2, selects all element nodes in the document.
|Figure 2: XPath Visualiser (default XPath expression).|
Figure 3 shows the default expression altered to select all attributes. (Note that entire attributes are selected -- their names, the equal signs, and their values.)
|Figure 3: XPath Visualiser, selecting all attributes.|
Finally, as you can see in Figure 4, you can use any conformant XPath 1.0 expression to select document content. The expression in Figure 4 says to select the text nodes representing the names of all relics whose prices begin with the digit 3.
|Figure 4: XPath Visualiser, selecting the names of relics with particular kinds of prices.|
I encourage you to download XPath Visualiser and dig into it. Don't forget to read the documentation and perhaps participate in the discussion group at the site; both will help you understand the specific system requirements for installing and running the package -- as well as its few limitations.
This tip is not, strictly speaking, XPath-related. But in the context of XSLT stylesheets, it can significantly enhance your ability to locate and process XML content.
EXSLT -- an acronym for Extensions to XSLT -- is an unofficial but well-organized community effort to fill some of the gaps in the XSLT 1.0 Recommendation. These gaps include, for example, the ability to process dates in various ways and the ability to transform source trees into multiple result-tree documents in a single pass. There are also quite a few commonly needed mathematical operations which the XPath numeric functions and operators don't address: given a node-set whose members all have numeric values, it's frequently necessary to select the element(s) with the largest or smallest values.
While EXSLT extension functions and elements have no official standing, support for them is built into numerous XSLT processors. To use the extensions, you need to know the name of the EXSLT module in question, for example "math" for the numeric processing features. Then include a namespace declaration in your stylesheet, as follows:
Of course, you have to replace "math" with the appropriate module name, if you're using a different one. Then just use the function or extension element as you would any other. For instance, you might use an XPath expression such as:
(//product)[price = math:min(price)]
to locate the product with the lowest price.
Detailed instructions for using EXSLT extensions, including links to the community mailing list, can be found at the EXSLT site.
In a perfect world, or at least a perfect Web, hyperlinks would never fail. Targeted resources would never move or disappear, Web servers and the network itself would never encounter any interruptions, and -- most importantly -- document authors would never make any errors (typographical or cognitive) in identifying resources in the first place.
The world and the Web being what they are, though, broken links are a fact of life. Luckily, the W3C Working Group (WG) responsible for the XPointer specs has taken this into consideration; they've provided us with a means to make XPointer-based hyperlinks a little less fragile. This solution involves "chaining" XPointers together, one after another. The XPointers are evaluated left to right; the first one which does not fail becomes, for all practical purposes, the XPointer. Consider an XML document like the following:
<villain> <name>Blofeld</name> <film>Thunderball</film> </villain>
An XLink into this document might attempt to locate a
only if the villain's name is Goldfinger, using an XPointer:
xlink:href="xpointer(//film[../name = 'Goldfinger'])"
While XLink-, let alone XPointer-aware applications are not exactly thick on the ground at the moment, we can guess that such an application would return the equivalent of a classic HTTP 404 "Document not found" error, given the above document. We could make the link more robust by "chaining" a fallback:
xlink:href="xpointer(//film[../name = 'Goldfinger']) xpointer(//film)"
Note: code above is a single line, split across two for formatting reasons.
This instructs the application that if it can't locate the villain
Goldfinger's film, it should try to locate the first
element in the document, whatever it is.
Note that this works only with scheme-type XPointers (such as the
XPointers above). If you're using shorthand XPointers, you're limited to
the familiar "get it right on the first try" form.
While namespaces in XML continue to make for spirited discussion in some circles, they're for the foreseeable future an inevitable feature of the XML landscape.
In XPath applications, you need to understand three terms: the qualified name, the local name, and the expanded name.
The qualified name (often abbreviated "qname") of an element or attribute
is the node's name, including the namespace prefix (if any), as it
appears in the document being accessed via XPath. If an element's name
in a source document is
book:section, that's its qname.
A node's local name is the name as it appears in the source document,
shorn of any namespace prefix. (You might call this the un-qname.) For
an element in a document named
book:section, the local name
The expanded name of an element or attribute is what most namespace-aware
applications really care about (unless you instruct them not to). It's a
combination of the URI associated with the namespace prefix, plus the
local name. Assume the
book: namespace prefix is declared
Then the expanded name of the
book:section element is the combination
of the strings "http://my.example.org/namespaces/book" and
"section". Exactly how the application builds the expanded name is up to
the application's developer. Most seem to follow a de-facto standard of
enclosing the namespace URI in curly braces,
} characters, followed by the local name. Such an
application would thus represent the expanded name of the
book:section element as follows:
The important thing to you as a user of XPath isn't the exact algorithm for
building an expanded name (which in any case is directly accessible only
within the processor itself, not to XPath expressions). The important
thing is that the processor will in general use only the expanded name
if it needs to disambiguate element or attribute names. Consider a
document which has two elements,
transform:template. Their local names are identical; the
only way to tell if the element names really are identical is to
examine their namespace URIs as well. If both the
transform: prefix are bound to the namespace URI
"http://www.w3.org/1999/XSL/Transform", then the two elements have the
same "name" even though their prefixes are different.
One implication of all this is that using the XPath
to return an element or attribute's "name" is a little deceptive: it
returns the qname. And no matter how unique its qname, a given element
or attribute may in fact have a name identical to others' simply
because their namespace URIs match, even when the prefixes are
When coding XPointers, remember that the vocabulary -- and hence the namespaces
-- of the document containing the XPointers will probably be quite different
from those of the document(s) being pointed to. To be absolutely sure
that the XPointer processor keeps it all straight, identify the
namespace(s) for target document vocabularies using the
xmlns() XPointer scheme. For instance:
Note: code below is a single line, split across two for formatting reasons.
The authors of XML books and articles have it easy, in one respect: the XML documents they use for examples don't generally need to be very long or complex.
Real applications rarely have that luxury. Source documents may contain many thousands of elements, just to cite the obvious case; throw in a mixture of comments, PIs, and voluminous text nodes, and you may find the streets of your location paths paved with molasses. Controlling this is to a large extent out of your hands. You can't rejigger the processor's internals, after all. (On the other hand, some processors may allow you to use parameters or command-line arguments to encourage them to behave in ways optimized for particular source document structures.) But one XPath optimization is easy -- it just requires you to surrender a particularly lazy habit.
The habit in question is excessive use of the
axis when you know the name of the target element (the node test) which
follows it. It's particularly tempting to fall back on this habit
because of the XPath
// shortcut (technically a shortcut
step). Considering a document even as simple as this should make the
<dictionary> <letter> <forms> <form type="upper">A</form> <form type="lower">a</form> </forms> <word> <spelling>aardvark</spelling> <part_of_speech>noun</part_of_speech> <definition>a nocturnal mammal of southern Africa with a tubular snout and a long tongue</definition> </word> </letter> </dictionary>
Both of the following location paths locate the
The second is a much more direct route to the desired result. It
leads the processor down the tree with no side trips, right to the
definition element. The first, in contrast, takes a
leisurely stroll through all descendants of the root node -- picking up
each one in turn and mulling it over ("Hmm, is this descendant a
definition element...?") before proceeding even further
through the tree. This includes irrelevant detours into the
forms branch of the tree and to the
part_of_speech siblings of the
Of course, for this extremely simple example document, the difference in processing time will be negligible. Turn this document into an entire dictionary, though, and the difference will be considerable. It's true that coding yard-long location paths into large documents can be both tedious and error-prone, certainly no one's idea of fun; but if huge gains in performance result from it, well, it's hard to argue in favor of fun.
As for XPointer, not only can you minimize (if not eliminate) your use of the
// shortcut; you can also fall back on alternative ways of
seeking content which aren't dependent on XPath at all. These are
so-called shorthand and child-sequence XPointers.
The former look like familiar (X)HTML named resources, as in:
where "someid" (the shorthand XPointer) matches the value of some ID-type attribute in the target document. (Of course, in order to use this kind of XPointer, the source document must have some ID-type attribute declared, via DTD or schema.)
Child-sequence XPointers use the new
element() XPointer scheme
to walk the processor down into the node tree without referencing
element names at all; it can simply count children. For instance,
locates (hold your breath) the fifteenth child of the third child of the fourth
child of the root element. This can foster huge performance gains in
processors equipped to handle the
element() scheme: an
XPath-based XPointer processor needs potentially to read in the entire
target resource in order to ensure that it's gotten every last bit of
matching content, while a child sequence-smart processor can simply
stream through the target document, taking only the designated forks in
the road and ignoring all others. (The downside, of course, is that you
can access only elements this way, and are restricted to navigating only
in a manner equivalent to XPath's
The XPath 1.0 spec attained W3C Recommendation status in late 1999 and has been hugely successful in the three years since. But it has its shortcomings, and XPath 2.0 -- aimed at filling in the gaps -- is already on the horizon. You can find the version 2.0 Working Draft (WD) at http://www.w3.org/TR/xpath20/. The current list of known "incompatibilities" between XPath 1.0 and 2.0 appears as Appendix F, at http://www.w3.org/TR/xpath20/#id-backwards-compatibility. If you're going to be using XPath for a while, I encourage you to visit this list, in order to minimize the surprises you may have to deal with downstream.
For XPointer, the situation is a little more complicated. Until very recently, XPointer was a single WD spec (most recently attaining Candidate Recommendation status, in September of 2001). While to some observers it seemed as though it would be frozen there forever, the XML Linking Working Group in July, 2002, made a huge change: they split the one spec into four.
There's now a central "root" spec, called XPointer Framework and bumped backwards a little to WD status. This is the specification that outlines general XPointer syntax rules, levels of processor conformance, and so on.
There are also three new offshoot specs, defining the use of specific XPointer
xpointer(). The first two of these are
Candidate Recommendations; the third (like the Framework) is back to WD
status. You can find these new specs at, respectively,
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.