Little Back Corners
Q: I can't find what I'm looking for in my GML document.
Whenever I try to specify an XPath location path in a GML document,
I receive a message saying XPath returned no results. Queries I have
//FeatureCollection/gml:featureMember. The document
in question looks like this:
<?xml version="1.0" encoding="UTF-8"?> <FeatureCollection xmlns="http://mydomain/schemas" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gml="http://www.opengis.net/gml" xsi:schemaLocation="http://mydomain/schemas event3.xsd"> <gml:boundedBy> <gml:Box> <gml:coordinates>100,100,100,100</gml:coordinates> </gml:Box> </gml:boundedBy> <gml:featureMember> <event> <geometryProperty> <gml:Point> <gml:coordinates>100,100</gml:coordinates> </gml:Point> </geometryProperty> <siteCode>AL1234</siteCode> <date>2004-10-10</date> <locationDescription>Somewhere</locationDescription> <eventType>Excavation</eventType> <period>Roman</period> </event> </gml:featureMember> </FeatureCollection>
A: I'll get to your question in a moment. First, though, allow me a bit of a digression.
One of the things I like most about writing this column is the opportunity to poke into little back corners of the XML universe. Of course, these aren't little to the people who deal with them every day; they are little only in the sense that they're well-known only among the relative handful of people who deal with them every day. We all know about the W3C standards (although I daresay nobody knows everything about every one of them); we all know about the big corporate and open-source players and tools. But only a few of us get to deal with some of the truly intriguing uses of XML.
Your question deals with one of these interesting niches. GML -- the Geography Markup Language -- is a specification of the Open GIS Consortium (OGC). According to the GML FAQ, the markup language:
provides an XML-based encoding of geospatial data; it can be viewed as a basic application framework for handling geographic information in an open and non-proprietary way. By leveraging related XML technologies (e.g. XML Schema, XSLT, XLink, XPointer, SVG) a GML dataset becomes easier to process in heterogeneous environments, and it can be readily intermixed with other types of data: text, video, imagery, etc.
The GML specification is now at version 3.1, most recently updated in June, 2003. Included in the mix are specifications for use of XLink and SMIL with "pure" GML. You can find the various schemas at the Open GIS site. An excellent, albeit slightly overwhelming source of information is the GML (version 3) Implementation Specification, a 548-page PDF behemoth.
For more information about GML, check the indispensable Robin Cover's "Cover Pages" entry on GML; the above-mentioned GML FAQ; and the GML Central site. A company called Snowflake Software offers a GML viewer called OS MasterMap Viewer, which purports to read not only raw GML documents but those compressed using WinZip/gzip formats.
Your document includes five elements in the GML
elements are used to assert the characteristics of a given geographic
feature; taken together with the elements in the default namespace
period, and so on), they seem to describe an
archaeological site from the Roman era. (This is all presumably just
"play data," right down to the coordinates defined in
gml:Box -- the coordinate pairs of 100,100 (x) and
100,100 (y) just define a single point in space.)
On to your question, which is not really about GML per se but rather about how to find, using XPath, some content in a GML document. (I hope the irony is not lost on you of not being able to locate something in a document whose very purpose is to locate something in the physical world...) The only problem with your XPath location paths, it turns out, is not XPath syntax as such, but its use when working with namespace declarations. In particular, the problem is your declaration for the default namespace:
I don't know why you need that declaration, since the namespace URI
is clearly a dummy or placeholder. If, in any case, you remove that
namespace declaration, you'll find that the location
//FeatureCollection/gml:featureMember work just
What's going on here?
In the XPath spec, we learn that
a node test that is a QName is true if and only if the type of the node (see [5 Data Model]) is the principal node type and has an expanded-name equal to the expanded-name specified by the QName.
What this means in practice is that an XPath processor doesn't deal with plain old element names, except those element names for which no namespace has been declared. If there's any namespace declaration at all, including one for the default (unprefixed) namespace, the processor uses the expanded-name (that is, the "qualified name," or qname) to identify the element.
While there's no formal requirement for how to form an
expanded-name, a de facto standard seems to exist among XPath
processors: replace the namespace prefix with the namespace URI
enclosed in "curly braces," the
characters. In the case of your elements in the non-default namespace,
gml:Box, the XPath processor is therefore
expanding both the element name in the XPath expression and the
element name(s) as they appear in the source document, as follows:
This works marvelously to solve the problems associated with "real"
namespaces -- in particular, it allows you to use more than one
namespace prefix to represent the same namespace, should you want to
do that. But it introduces a very weird problem of its own when
dealing with element names in the declared default namespace. In
essence the expanded name of your original
FeatureCollection element (in the default namespace)
The real difficulty is that XPath syntax needs to satisfy two
irreconcilable requirements: handling elements in a declared but
default (unprefixed) namespace and handling elements in no namespace
at all, which do not have expanded names. In reconciling this dilemma,
the XPath spec says that an unprefixed name in an XPath
expression is assumed to be in an undeclared namespace, even when
the name as it appears in the instance document has (as a
result of a namespace declaration) an expanded name. Thus, your
//FeatureCollection "query" is instructing XPath to
locate an element which does not exist, a
FeatureCollection element in an undeclared namespace.
Also in XML Q&A
The same holds true for the
//FeatureCollection/gml:featureMember location path, by
the way. Since -- to the XPath processor's squinty eyes -- there is
FeatureCollection element, it has no children at all,
named "gml:featureMember" or anything else. If you want to locate
gml:featureMember. element, just remove the reference
to its non-existent
Suppose you can't, for some reason, simply strip out the default
namespace declaration? In this case you have to jump through a minor
hoop: instruct XPath to locate all elements in the document, and then
refine (via a predicate and the
the node-set of candidates to those with a local name of
"FeatureCollection". (The local name is the element name sans
namespace prefix, and it is not subject to expansion even if in the
declared default namespace.) Your location path will now look like
You can also use this technique to locate the
Don't feel chagrined by not having previously picked up on XPath's treatment of expanded names in a default namespace. While it does make sense of a sort -- they needed to reconcile the irreconcilable somehow -- it remains one of the strangest little back corners of the XPath universe, even to people who deal with XPath every day!
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.