Little Back Corners
February 25, 2004
Q: I can't find what I'm looking for in my GML document.
Whenever I try to specify an XPath location path in a GML document, I receive a message
saying XPath returned no results. Queries I have used include
The document in question looks like this:
<?xml version="1.0" encoding="UTF-8"?> <FeatureCollection xmlns="http://mydomain/schemas" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gml="http://www.opengis.net/gml" xsi:schemaLocation="http://mydomain/schemas event3.xsd"> <gml:boundedBy> <gml:Box> <gml:coordinates>100,100,100,100</gml:coordinates> </gml:Box> </gml:boundedBy> <gml:featureMember> <event> <geometryProperty> <gml:Point> <gml:coordinates>100,100</gml:coordinates> </gml:Point> </geometryProperty> <siteCode>AL1234</siteCode> <date>2004-10-10</date> <locationDescription>Somewhere</locationDescription> <eventType>Excavation</eventType> <period>Roman</period> </event> </gml:featureMember> </FeatureCollection>
A: I'll get to your question in a moment. First, though, allow me a bit of a digression.
One of the things I like most about writing this column is the opportunity to poke into little back corners of the XML universe. Of course, these aren't little to the people who deal with them every day; they are little only in the sense that they're well-known only among the relative handful of people who deal with them every day. We all know about the W3C standards (although I daresay nobody knows everything about every one of them); we all know about the big corporate and open-source players and tools. But only a few of us get to deal with some of the truly intriguing uses of XML.
Your question deals with one of these interesting niches. GML -- the Geography Markup Language -- is a specification of the Open GIS Consortium (OGC). According to the GML FAQ, the markup language:
provides an XML-based encoding of geospatial data; it can be viewed as a basic application framework for handling geographic information in an open and non-proprietary way. By leveraging related XML technologies (e.g. XML Schema, XSLT, XLink, XPointer, SVG) a GML dataset becomes easier to process in heterogeneous environments, and it can be readily intermixed with other types of data: text, video, imagery, etc.
The GML specification is now at version 3.1, most recently updated in June, 2003. Included in the mix are specifications for use of XLink and SMIL with "pure" GML. You can find the various schemas at the Open GIS site. An excellent, albeit slightly overwhelming source of information is the GML (version 3) Implementation Specification, a 548-page PDF behemoth.
For more information about GML, check the indispensable Robin Cover's "Cover Pages" entry on GML; the above-mentioned GML FAQ; and the GML Central site. A company called Snowflake Software offers a GML viewer called OS MasterMap Viewer, which purports to read not only raw GML documents but those compressed using WinZip/gzip formats.
Your document includes five elements in the GML namespace:
gml:Point. These elements are used to assert the characteristics of a given
geographic feature; taken together with the elements in the default namespace (such
period, and so on), they seem to describe an
archaeological site from the Roman era. (This is all presumably just "play data,"
to the coordinates defined in
gml:Box -- the coordinate pairs of 100,100 (x)
and 100,100 (y) just define a single point in space.)
The fault in the default
On to your question, which is not really about GML per se but rather about how to find, using XPath, some content in a GML document. (I hope the irony is not lost on you of not being able to locate something in a document whose very purpose is to locate something in the physical world...) The only problem with your XPath location paths, it turns out, is not XPath syntax as such, but its use when working with namespace declarations. In particular, the problem is your declaration for the default namespace:
I don't know why you need that declaration, since the namespace URI is clearly a dummy
placeholder. If, in any case, you remove that namespace declaration, you'll find that
//FeatureCollection/gml:featureMember work just fine.
What's going on here?
In the XPath spec, we learn that
a node test that is a QName is true if and only if the type of the node (see [5 Data Model]) is the principal node type and has an expanded-name equal to the expanded-name specified by the QName.
What this means in practice is that an XPath processor doesn't deal with plain old element names, except those element names for which no namespace has been declared. If there's any namespace declaration at all, including one for the default (unprefixed) namespace, the processor uses the expanded-name (that is, the "qualified name," or qname) to identify the element.
While there's no formal requirement for how to form an expanded-name, a de facto standard
seems to exist among XPath processors: replace the namespace prefix with the namespace
enclosed in "curly braces," the
} characters. In the case of
your elements in the non-default namespace, such as
gml:Box, the XPath
processor is therefore expanding both the element name in the XPath expression and
element name(s) as they appear in the source document, as follows:
This works marvelously to solve the problems associated with "real" namespaces --
particular, it allows you to use more than one namespace prefix to represent the same
namespace, should you want to do that. But it introduces a very weird problem of its
when dealing with element names in the declared default namespace. In essence the
name of your original
FeatureCollection element (in the default namespace) is
The real difficulty is that XPath syntax needs to satisfy two irreconcilable requirements:
handling elements in a declared but default (unprefixed) namespace and handling elements
no namespace at all, which do not have expanded names. In reconciling this dilemma,
XPath spec says that an unprefixed name in an XPath expression is assumed to be in
an undeclared namespace, even when the name as it appears in the instance document
has (as a result of a namespace declaration) an expanded name. Thus, your
//FeatureCollection "query" is instructing XPath to locate an element which
does not exist, a
FeatureCollection element in an undeclared namespace.
Also in XML Q&A
The same holds true for the
path, by the way. Since -- to the XPath processor's squinty eyes -- there is no
FeatureCollection element, it has no children at all, named
"gml:featureMember" or anything else. If you want to locate the
gml:featureMember. element, just remove the reference to its non-existent
Suppose you can't, for some reason, simply strip out the default namespace declaration?
this case you have to jump through a minor hoop: instruct XPath to locate all elements
the document, and then refine (via a predicate and the
the node-set of candidates to those with a local name of "FeatureCollection". (The
name is the element name sans namespace prefix, and it is not subject to expansion
even if in the declared default namespace.) Your location path will now look like
You can also use this technique to locate the
Don't feel chagrined by not having previously picked up on XPath's treatment of expanded names in a default namespace. While it does make sense of a sort -- they needed to reconcile the irreconcilable somehow -- it remains one of the strangest little back corners of the XPath universe, even to people who deal with XPath every day!