Little Back Corners

February 25, 2004

John E. Simpson

Q: I can't find what I'm looking for in my GML document.

Whenever I try to specify an XPath location path in a GML document, I receive a message saying XPath returned no results. Queries I have used include //FeatureCollection and //FeatureCollection/gml:featureMember. The document in question looks like this:

<?xml version="1.0" encoding="UTF-8"?>






























A: I'll get to your question in a moment. First, though, allow me a bit of a digression.

One of the things I like most about writing this column is the opportunity to poke into little back corners of the XML universe. Of course, these aren't little to the people who deal with them every day; they are little only in the sense that they're well-known only among the relative handful of people who deal with them every day. We all know about the W3C standards (although I daresay nobody knows everything about every one of them); we all know about the big corporate and open-source players and tools. But only a few of us get to deal with some of the truly intriguing uses of XML.

Your question deals with one of these interesting niches. GML -- the Geography Markup Language -- is a specification of the Open GIS Consortium (OGC). According to the GML FAQ, the markup language:

provides an XML-based encoding of geospatial data; it can be viewed as a basic application framework for handling geographic information in an open and non-proprietary way. By leveraging related XML technologies (e.g. XML Schema, XSLT, XLink, XPointer, SVG) a GML dataset becomes easier to process in heterogeneous environments, and it can be readily intermixed with other types of data: text, video, imagery, etc.

The GML specification is now at version 3.1, most recently updated in June, 2003. Included in the mix are specifications for use of XLink and SMIL with "pure" GML. You can find the various schemas at the Open GIS site. An excellent, albeit slightly overwhelming source of information is the GML (version 3) Implementation Specification, a 548-page PDF behemoth.

For more information about GML, check the indispensable Robin Cover's "Cover Pages" entry on GML; the above-mentioned GML FAQ; and the GML Central site. A company called Snowflake Software offers a GML viewer called OS MasterMap Viewer, which purports to read not only raw GML documents but those compressed using WinZip/gzip formats.

Your document includes five elements in the GML namespace: gml:boundedBy, gml:featureMember, gml:Box, gml:coordinates, and gml:Point. These elements are used to assert the characteristics of a given geographic feature; taken together with the elements in the default namespace (such as siteCode, period, and so on), they seem to describe an archaeological site from the Roman era. (This is all presumably just "play data," right down to the coordinates defined in gml:Box -- the coordinate pairs of 100,100 (x) and 100,100 (y) just define a single point in space.)

The fault in the default

On to your question, which is not really about GML per se but rather about how to find, using XPath, some content in a GML document. (I hope the irony is not lost on you of not being able to locate something in a document whose very purpose is to locate something in the physical world...) The only problem with your XPath location paths, it turns out, is not XPath syntax as such, but its use when working with namespace declarations. In particular, the problem is your declaration for the default namespace:


I don't know why you need that declaration, since the namespace URI is clearly a dummy or placeholder. If, in any case, you remove that namespace declaration, you'll find that the location paths //FeatureCollection and //FeatureCollection/gml:featureMember work just fine.

What's going on here?

In the XPath spec, we learn that

a node test that is a QName is true if and only if the type of the node (see [5 Data Model]) is the principal node type and has an expanded-name equal to the expanded-name specified by the QName.

What this means in practice is that an XPath processor doesn't deal with plain old element names, except those element names for which no namespace has been declared. If there's any namespace declaration at all, including one for the default (unprefixed) namespace, the processor uses the expanded-name (that is, the "qualified name," or qname) to identify the element.

While there's no formal requirement for how to form an expanded-name, a de facto standard seems to exist among XPath processors: replace the namespace prefix with the namespace URI enclosed in "curly braces," the { and } characters. In the case of your elements in the non-default namespace, such as gml:Box, the XPath processor is therefore expanding both the element name in the XPath expression and the element name(s) as they appear in the source document, as follows:


This works marvelously to solve the problems associated with "real" namespaces -- in particular, it allows you to use more than one namespace prefix to represent the same namespace, should you want to do that. But it introduces a very weird problem of its own when dealing with element names in the declared default namespace. In essence the expanded name of your original FeatureCollection element (in the default namespace) is {}FeatureCollection.

The real difficulty is that XPath syntax needs to satisfy two irreconcilable requirements: handling elements in a declared but default (unprefixed) namespace and handling elements in no namespace at all, which do not have expanded names. In reconciling this dilemma, the XPath spec says that an unprefixed name in an XPath expression is assumed to be in an undeclared namespace, even when the name as it appears in the instance document has (as a result of a namespace declaration) an expanded name. Thus, your //FeatureCollection "query" is instructing XPath to locate an element which does not exist, a FeatureCollection element in an undeclared namespace.

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

The same holds true for the //FeatureCollection/gml:featureMember location path, by the way. Since -- to the XPath processor's squinty eyes -- there is no FeatureCollection element, it has no children at all, named "gml:featureMember" or anything else. If you want to locate the gml:featureMember. element, just remove the reference to its non-existent FeatureCollection parent: //gml:featureMember.

Suppose you can't, for some reason, simply strip out the default namespace declaration? In this case you have to jump through a minor hoop: instruct XPath to locate all elements in the document, and then refine (via a predicate and the local-name() function) the node-set of candidates to those with a local name of "FeatureCollection". (The local name is the element name sans namespace prefix, and it is not subject to expansion even if in the declared default namespace.) Your location path will now look like this:


You can also use this technique to locate the gml:featureMember element:


Don't feel chagrined by not having previously picked up on XPath's treatment of expanded names in a default namespace. While it does make sense of a sort -- they needed to reconcile the irreconcilable somehow -- it remains one of the strangest little back corners of the XPath universe, even to people who deal with XPath every day!