Top Ten Tips to Using XPath and XPointer

August 21, 2002

John Simpson is the author of XPath and XPointer

XPath and XPointer allow XML developers and document authors to find and manipulate specific needles of content in an XML document's haystack. From mindful use of predicates, to processor efficiency, to exploring both the standards themselves and extensions to them, this article offers ten tips -- techniques and gotchas -- to bear in mind as you use XPath and XPointer in your own work.

1. Beware of whitespace when counting nodes.

Consider a simple document, such as the following:

   <year> 

      <month monthnum="4">April</month> 

   </year>

Ask yourself how many children does the year element have?

If you think the answer is one, the month element, you're wrong. Answering the question with XPath might look something like this:

   count(/year/node())

If you submit this XPath expression to a "pure" XPath-aware processor, such as the one built into the Saxon XSLT engine, you're told that year has three children. What's going on? The first bit of content that follows the opening of the year element isn't the month element, although it looks that way to the human eye. Rather, the first bit of content within the year element is a text node (a number of blank spaces, a newline, and a few more blank spaces), which precede the opening of the month element. There's also a child text node (a simple newline) following the month element's close, just before the close of the year element. That is, to an XPath-aware processor, this document resembles Figure 1.

Figure 1: An XPath processor's-eye view of a document with "invisible" whitespace

What do I mean by a "pure" XPath-aware processor? The one to look out for is the MSXML processor, freely provided by Microsoft both for use as a stand-alone product and embedded in Internet Explorer (MSIE). When you use the preceding XPath expressions to view the document in MSIE, via an XSLT transformation, you get the mistaken (albeit, perhaps, more common sense) answer: one child.

MSXML includes an XML parser, an XPath-compliant XSLT engine, and an interface to the outside world (like MSIE). The parser and the XSLT engine are both excellent, standards-compliant components. It's the latter which produces the seemingly non-conformant behavior when dealing with whitespace-only text nodes. This behavior is controlled by a property, preserveWhiteSpace, with true or false values. The default is false, which causes MSIE to display the document incorrectly. In order to change this behavior -- and make MSIE behave "purely" -- you must use scripting to set preserveWhiteSpace to true explicitly.

2. Keep an open mind about predicates: nested, "compound," and so on.

The predicate is such a powerful, valuable piece of a location step's real estate that you may be reluctant to try anything beyond the simplest ones. Don't be. The predicate is there to enable you to grab exactly the node(s) you need from among all those candidates visible along a given axis from the context node. Why limit yourself to selecting, say, just the nth child or elements with a particular attribute? Stretch your wings by experimenting with multiple predicates in a given location step or path.

Here's a simple XML document:

   <roofers> 

      <roofing-material> 

         <manufacturer>Salitieri</manufacturer> 

         <type>thatch</type> 

      </roofing-material> 

      <roofing-material> 

         <manufacturer>Nash</manufacturer> 

         <type>shingles</type> 

      </roofing-material> 

      <roofing-material> 

         <manufacturer>DeBrutus</manufacturer> 

         <type>fiberglass</type> 

      </roofing-material> 

      <roofing-material> 

         <manufacturer>Short</manufacturer> 

         <type>shingles</type> 

      </roofing-material> 

   </roofers>

Suppose you want to locate a nodeset consisting of any roofing-material element whose type is "shingles", but only if the manufacturer is "Nash". Either of the following approaches works (pay special attention to the predicates):

//roofing-material[type="shingles"][manufacturer="Nash"] 

//roofing-material[type="shingles" and manufacturer="Nash"] 

//roofing-material[type[preceding-sibling::manufacturer="Nash"]="shingles"]

While the results of these three approaches are identical for this document, in other documents they might be quite different. And all three -- including that weird-looking "nested predicate" in the third example -- are perfectly legal XPath 1.0.

Beware of one trap when using the approach employed in the first of the three preceding location paths -- I think of it as a "stacked" predicate. The order in which predicates appear on the stack can affect the final result. Each succeeding predicate is evaluated in terms of the narrowed context provided by the preceding one(s), and not just in terms of the general context in which a single predicate would be evaluated. Here's a sample document to illustrate this point.

   <tosses> 

      <toss result="heads"/> 

      <toss result="heads"/> 

      <toss result="tails"/> 

      <toss result="heads"/> 

   </tosses>

Now consider the following two location paths into this document, each using a stacked predicate:

   (//toss)[@result="heads"][3] 

   (//toss)[3][@result="heads"]

See the difference? The first path locates (a) all toss elements whose result attribute equals "heads", and then (b) the third one of those toss elements. Therefore, it selects the fourth toss element in the document.

The second path selects the third toss element, and then the stacked predicate applies a further screen, selecting the third toss element only if its result attribute has a value of "heads". Because the third toss element's result attribute is "tails", therefore, this location path returns an empty node-set.

3. The string-value of a node is just a special case of the string-value of a node-set.

Consider another simple document,

   <quotation> 

      <source> 

       <author>Firesign Theatre</author> 

       <work year="1970">

        Don't Crush that Dwarf, Hand Me The Pliers

       </work> 

      </source> 

      <text>

       And there's hamburger all over the highway in Mystic, Connecticut.

      </text> 

   </quotation>

Ignoring the "invisible" whitespace (as discussed in tip #1), what's the string-value of the quotation element? In XPath terms, we're seeking the string-value of the element node identified by this location path:

   /quotation

Like the string-value of any other element, it's the concatenated values of all text nodes in the element's scope -- that is:

Firesign TheatreDon't Crush that Dwarf, Hand Me The 

PliersAnd there's hamburger all over the highway in Mystic, 

Connecticut.

That old devil, common sense, might lead you to conclude that the following XPath location path has the exact same string-value:

   /quotation/*

Run this path through an XPath processor, though, and what you get is simply

Firesign TheatreDon't Crush that Dwarf, Hand Me The 

Pliers

The behavior here is summed up in the rule of thumb, "The string-value of any node-set is the string-value of only the first node in the set." Thus, the second location path returns a node-set consisting of two nodes, the source and text elements. And the first of these, source, is the only one whose string-value counts as the string-value of the entire node-set.

So what about the first case? Does that imply an exception to the general rule of thumb, an exception for root elements? No. The first location path selects a node-set which just happens to consist of a single node, and that node, of course, is thus the "first" node in the node-set. Both location paths obey the general rule.

4. Remember the difference between value1 != value2 and not(value1 = value2).

This one can be very confounding if you're not careful. Take a look at these two sample location paths:

   //employee[@id != "emp1002"] 

   //employee[not(@id = "emp1002")]

The first example selects each employee element node which has an id attribute whose value does not equal "emp1002" -- note that this excludes those with no id attribute at all. The second selects all employee element nodes which do not have an id attribute whose value is "emp1002". So assume, then, a document with an employee element such as this:

   <employee>...</employee>

The first location path will not select this employee element: since it has no id attribute at all, it does not have an id attribute whose value is not emp1002 (or anything else, for that matter). The second, on the other hand, will select this employee element: it has no id attribute whose value is not emp1002.

5. Find and use a good tool for testing your XPath expressions.

After a while, you may become so self-assured when slinging XPath that you never need to test the results: you'll instinctively know the effect of one expression versus slightly different ones. Still, it's hard to imagine you'll be prepared for absolutely every eventuality, every nuance in a given document's node tree, every wrinkle in the XPath spec. At such a time, you absolutely must have a good tool to reassure you that you're on the right, well, path.

One common class of XPath testing tools, naturally, is comprised of all the production-grade XSLT processors: Saxon, Xalan, MSXML, and so on. In order to interpret and act on the template rules and other code in XSLT stylesheets, these processors must first "know" XPath. If you've coded your location paths properly, the transformation cranked through the processor will work properly; if you haven't, it won't -- properly or maybe at all.

This is kind of indirect confirmation, though. Wouldn't it be more useful to have a tool which, say, "lit up" the selected node(s) for a given XPath expression, especially if you're using XPath in a non-XSLT context like XPointer or XQuery?

My own work takes place mostly on Microsoft Windows platforms. Even if yours doesn't, you probably have access to a Windows PC. If you do, you're in luck: there's a great tool written by Dimitre Novachev, "XPath Visualiser". You can obtain it from the Top XML (formerly VBXML) site.

XPath Visualiser isn't a standalone software package. Instead, it's a frameset HTML document which you open in MSIE. The top frame includes some form elements, such as fields for entering the name of the document you want to view and the XPath expression you want to test, and a "Select Nodes" button to highlight all nodes selected by the XPath expression you've entered. The default expression, as shown in Figure 2, selects all element nodes in the document.

Figure 2: XPath Visualiser (default XPath expression).

Figure 3 shows the default expression altered to select all attributes. (Note that entire attributes are selected -- their names, the equal signs, and their values.)

Figure 3: XPath Visualiser, selecting all attributes.

Finally, as you can see in Figure 4, you can use any conformant XPath 1.0 expression to select document content. The expression in Figure 4 says to select the text nodes representing the names of all relics whose prices begin with the digit 3.

Figure 4: XPath Visualiser, selecting the names of relics with particular kinds of prices.

I encourage you to download XPath Visualiser and dig into it. Don't forget to read the documentation and perhaps participate in the discussion group at the site; both will help you understand the specific system requirements for installing and running the package -- as well as its few limitations.

6. Explore EXSLT.

This tip is not, strictly speaking, XPath-related. But in the context of XSLT stylesheets, it can significantly enhance your ability to locate and process XML content.

EXSLT -- an acronym for Extensions to XSLT -- is an unofficial but well-organized community effort to fill some of the gaps in the XSLT 1.0 Recommendation. These gaps include, for example, the ability to process dates in various ways and the ability to transform source trees into multiple result-tree documents in a single pass. There are also quite a few commonly needed mathematical operations which the XPath numeric functions and operators don't address: given a node-set whose members all have numeric values, it's frequently necessary to select the element(s) with the largest or smallest values.

While EXSLT extension functions and elements have no official standing, support for them is built into numerous XSLT processors. To use the extensions, you need to know the name of the EXSLT module in question, for example "math" for the numeric processing features. Then include a namespace declaration in your stylesheet, as follows:

   xmlns:math="http://www.exslt.org/math"

Of course, you have to replace "math" with the appropriate module name, if you're using a different one. Then just use the function or extension element as you would any other. For instance, you might use an XPath expression such as:

   (//product)[price = math:min(price)]

to locate the product with the lowest price.

Detailed instructions for using EXSLT extensions, including links to the community mailing list, can be found at the EXSLT site.

7. Fail-safe your XPointers.

In a perfect world, or at least a perfect Web, hyperlinks would never fail. Targeted resources would never move or disappear, Web servers and the network itself would never encounter any interruptions, and -- most importantly -- document authors would never make any errors (typographical or cognitive) in identifying resources in the first place.

The world and the Web being what they are, though, broken links are a fact of life. Luckily, the W3C Working Group (WG) responsible for the XPointer specs has taken this into consideration; they've provided us with a means to make XPointer-based hyperlinks a little less fragile. This solution involves "chaining" XPointers together, one after another. The XPointers are evaluated left to right; the first one which does not fail becomes, for all practical purposes, the XPointer. Consider an XML document like the following:

   <villain> 

      <name>Blofeld</name> 

      <film>Thunderball</film> 

   </villain>

An XLink into this document might attempt to locate a film element only if the villain's name is Goldfinger, using an XPointer:

    xlink:href="xpointer(//film[../name = 'Goldfinger'])"

While XLink-, let alone XPointer-aware applications are not exactly thick on the ground at the moment, we can guess that such an application would return the equivalent of a classic HTTP 404 "Document not found" error, given the above document. We could make the link more robust by "chaining" a fallback:

   xlink:href="xpointer(//film[../name = 'Goldfinger']) 

xpointer(//film[1])"

Note: code above is a single line, split across two for formatting reasons.

This instructs the application that if it can't locate the villain Goldfinger's film, it should try to locate the first film element in the document, whatever it is.

Note that this works only with scheme-type XPointers (such as the xpointer()-schemed XPointers above). If you're using shorthand XPointers, you're limited to the familiar "get it right on the first try" form.

8. Remember to keep namespaces straight (in both XPath and XPointer applications).

While namespaces in XML continue to make for spirited discussion in some circles, they're for the foreseeable future an inevitable feature of the XML landscape.

In XPath applications, you need to understand three terms: the qualified name, the local name, and the expanded name.

The qualified name (often abbreviated "qname") of an element or attribute is the node's name, including the namespace prefix (if any), as it appears in the document being accessed via XPath. If an element's name in a source document is book:section, that's its qname.

A node's local name is the name as it appears in the source document, shorn of any namespace prefix. (You might call this the un-qname.) For an element in a document named book:section, the local name is simply section.

The expanded name of an element or attribute is what most namespace-aware applications really care about (unless you instruct them not to). It's a combination of the URI associated with the namespace prefix, plus the local name. Assume the book: namespace prefix is declared like this:

   xmlns:book="http://my.example.org/namespaces/book"

Then the expanded name of the book:section element is the combination of the strings "http://my.example.org/namespaces/book" and "section". Exactly how the application builds the expanded name is up to the application's developer. Most seem to follow a de-facto standard of enclosing the namespace URI in curly braces, { and } characters, followed by the local name. Such an application would thus represent the expanded name of the book:section element as follows:

   {http://my.example.org/namespaces/book}section

The important thing to you as a user of XPath isn't the exact algorithm for building an expanded name (which in any case is directly accessible only within the processor itself, not to XPath expressions). The important thing is that the processor will in general use only the expanded name if it needs to disambiguate element or attribute names. Consider a document which has two elements, xsl:template and transform:template. Their local names are identical; the only way to tell if the element names really are identical is to examine their namespace URIs as well. If both the xsl: and the transform: prefix are bound to the namespace URI "http://www.w3.org/1999/XSL/Transform", then the two elements have the same "name" even though their prefixes are different.

One implication of all this is that using the XPath name() function to return an element or attribute's "name" is a little deceptive: it returns the qname. And no matter how unique its qname, a given element or attribute may in fact have a name identical to others' simply because their namespace URIs match, even when the prefixes are different.

When coding XPointers, remember that the vocabulary -- and hence the namespaces -- of the document containing the XPointers will probably be quite different from those of the document(s) being pointed to. To be absolutely sure that the XPointer processor keeps it all straight, identify the namespace(s) for target document vocabularies using the xmlns() XPointer scheme. For instance:

   xmlns(book=http://my.example.org/namespaces/book) 

xpointer(//book:section)

Note: code below is a single line, split across two for formatting reasons.

9. Don't forget processor efficiency in XPath and XPointer.

The authors of XML books and articles have it easy, in one respect: the XML documents they use for examples don't generally need to be very long or complex.

Real applications rarely have that luxury. Source documents may contain many thousands of elements, just to cite the obvious case; throw in a mixture of comments, PIs, and voluminous text nodes, and you may find the streets of your location paths paved with molasses. Controlling this is to a large extent out of your hands. You can't rejigger the processor's internals, after all. (On the other hand, some processors may allow you to use parameters or command-line arguments to encourage them to behave in ways optimized for particular source document structures.) But one XPath optimization is easy -- it just requires you to surrender a particularly lazy habit.

The habit in question is excessive use of the descendant-or-self:: axis when you know the name of the target element (the node test) which follows it. It's particularly tempting to fall back on this habit because of the XPath // shortcut (technically a shortcut for the /descendant-or-self::node()/ location step). Considering a document even as simple as this should make the point:

   <dictionary> 

      <letter> 

         <forms> 

            <form type="upper">A</form> 

            <form type="lower">a</form> 

         </forms> 

         <word> 

            <spelling>aardvark</spelling> 

            <part_of_speech>noun</part_of_speech> 

            <definition>a nocturnal mammal of southern Africa 

with a tubular snout and a long tongue</definition> 

         </word> 

      </letter> 

   </dictionary>

Both of the following location paths locate the definition element:

   //definition 

   /dictionary/letter/word/definition

The second is a much more direct route to the desired result. It leads the processor down the tree with no side trips, right to the definition element. The first, in contrast, takes a leisurely stroll through all descendants of the root node -- picking up each one in turn and mulling it over ("Hmm, is this descendant a definition element...?") before proceeding even further through the tree. This includes irrelevant detours into the forms branch of the tree and to the spelling and part_of_speech siblings of the definition node.

Of course, for this extremely simple example document, the difference in processing time will be negligible. Turn this document into an entire dictionary, though, and the difference will be considerable. It's true that coding yard-long location paths into large documents can be both tedious and error-prone, certainly no one's idea of fun; but if huge gains in performance result from it, well, it's hard to argue in favor of fun.

As for XPointer, not only can you minimize (if not eliminate) your use of the // shortcut; you can also fall back on alternative ways of seeking content which aren't dependent on XPath at all. These are so-called shorthand and child-sequence XPointers.

The former look like familiar (X)HTML named resources, as in:

   xlink:href="somedoc.xml/#someid"

where "someid" (the shorthand XPointer) matches the value of some ID-type attribute in the target document. (Of course, in order to use this kind of XPointer, the source document must have some ID-type attribute declared, via DTD or schema.)

Child-sequence XPointers use the new element() XPointer scheme to walk the processor down into the node tree without referencing element names at all; it can simply count children. For instance,

   element(1/4/3/15)

locates (hold your breath) the fifteenth child of the third child of the fourth child of the root element. This can foster huge performance gains in processors equipped to handle the element() scheme: an XPath-based XPointer processor needs potentially to read in the entire target resource in order to ensure that it's gotten every last bit of matching content, while a child sequence-smart processor can simply stream through the target document, taking only the designated forks in the road and ignoring all others. (The downside, of course, is that you can access only elements this way, and are restricted to navigating only in a manner equivalent to XPath's child:: axis.)

10. Keep an eye out for spec changes.

The XPath 1.0 spec attained W3C Recommendation status in late 1999 and has been hugely successful in the three years since. But it has its shortcomings, and XPath 2.0 -- aimed at filling in the gaps -- is already on the horizon. You can find the version 2.0 Working Draft (WD) at http://www.w3.org/TR/xpath20/. The current list of known "incompatibilities" between XPath 1.0 and 2.0 appears as Appendix F, at http://www.w3.org/TR/xpath20/#id-backwards-compatibility. If you're going to be using XPath for a while, I encourage you to visit this list, in order to minimize the surprises you may have to deal with downstream.

For XPointer, the situation is a little more complicated. Until very recently, XPointer was a single WD spec (most recently attaining Candidate Recommendation status, in September of 2001). While to some observers it seemed as though it would be frozen there forever, the XML Linking Working Group in July, 2002, made a huge change: they split the one spec into four.

There's now a central "root" spec, called XPointer Framework and bumped backwards a little to WD status. This is the specification that outlines general XPointer syntax rules, levels of processor conformance, and so on.

There are also three new offshoot specs, defining the use of specific XPointer schemes: XPointer element(), XPointer xmlns(), and XPointer xpointer(). The first two of these are Candidate Recommendations; the third (like the Framework) is back to WD status. You can find these new specs at, respectively, www.w3.org/TR/xptr-element/, www.w3.org/TR/xptr-xmlns/, and www.w3.org/TR/xptr-xpointer/.