Mapping and Markup, Part 2
December 29, 2004
In Part 1 of this "XML Tourist" feature, I discussed some basic ingredients of a Geographic Information System (GIS) and introduced you to an XML-based, web-delivered application for representing GIS data: the Geography Markup Language, or GML. This month, we'll delve deeper into GML itself--starting with a brief further look into what distinguishes a true GIS from other tools for rendering two- or three-dimensional spaces on a computer monitor.
The problem of representing geography (graphically, in text, or by any other means) seems, on the face of it, pretty elementary. A region's geography, after all, seems more or less fixed. Discounting such effects as erosion and artificial reshaping (damming rivers, strip-mining mountains, and the like), the land is the land, a river today is pretty much the same river tomorrow, and so on. Once you've got a high-resolution aerial photograph of a given area, that's it, right?
Well, no. What's depicted in such a photograph (or CAD drawing, for that matter) will mean different things to different viewers. One person will "see" political boundaries (countries, states or provinces, cities, and so on); another sees plant hardiness zones; yet another sees the start of a water table map; and a fourth sees the location of municipal parks (perhaps complete with more details, such as the facilities available at each). All of these meanings are equally true and equally real, no matter how mutually exclusive they may seem. And the ability to represent each one is the work of a real GIS: connecting data-- meanings--to visual representations of geographic features.
Two terms are commonly used for these superimpositions of human meaning onto geography, layers and coverages. Historically, the two terms were not identical, and different vendors may choose one term over another. But for practical purposes at this level, they're pretty much interchangeable. GML, for its part, uses "coverage," and that's the one I'll use as well.
Perhaps the most important thing to understand about GML (as opposed to other XML applications which you might think of as "mapping" applications, such as SVG, VML, and so on) is that GML does not, in itself, specify the presentation of anything like "maps." Rather, it specifies data related to maps. It's a language of mapping descriptors which can be used standalone or, more often, embedded in ("extended by") other XML applications. In this and in other respects, a "GML document" does not resemble a simple XML document in other vocabularies--in which all the markup belongs to one namespace--but rather like one annotated, say, via RDF. (Indeed, to paraphrase a portion of the RDF Primer, you could say that GML is intended to provide a simple way to make statements about geographic features.)
Rather than "pure" GML documents, therefore, what you tend to see are documents conforming to schemas which have themselves imported one or more of the over thirty schemas which constitute the bulk of the GML spec. To illustrate, the following is a simple GML document. This document has been included in one way or another with every version of GML (which is currently at version 3.1); for easy reference, I've boldfaced markup specifically related to GML.
<?xml version="1.0" encoding="UTF-8"?> <!-- File: cambridge.xml --> <CityModel xmlns="http://www.opengis.net/examples" xmlns:gml="http://www.opengis.net/gml" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/examples city.xsd"> <gml:name>Cambridge</gml:name> <gml:boundedBy> <gml:Box srsName="http://www.opengis.net/gml/srs/epsg.xml#4326"> <gml:coordinates>0.0,0.0</gml:coordinates> <gml:coordinates>100.0,100.0</gml:coordinates> </gml:Box> </gml:boundedBy> <cityMember> <River> <gml:description>The river that runs through Cambridge.</gml:description> <gml:name>Cam</gml:name> <gml:centerLineOf> <gml:LineString srsName="http://www.opengis.net/gml/srs/epsg.xml#4326"> <gml:coordinates>0,50</gml:coordinates> <gml:coordinates>70,60</gml:coordinates> <gml:coordinates>100,50</gml:coordinates> </gml:LineString> </gml:centerLineOf> </River> </cityMember> <cityMember> <Road> <gml:name>M11</gml:name> <linearGeometry> <gml:LineString srsName="http://www.opengis.net/gml/srs/epsg.xml#4326"> <gml:coordinates>0,5.0</gml:coordinates> <gml:coordinates>20.6,10.7</gml:coordinates> <gml:coordinates>80.5,60.9</gml:coordinates> </gml:LineString> </linearGeometry> <classification>motorway</classification> <number>11</number> </Road> </cityMember> <cityMember xlink:type="simple" xlink:title="Trinity Lane" xlink:href="http://www.foo.net/cgi-bin/wfs?FeatureID=C10239" gml:remoteSchema="city.xsd#xpointer(//complexType[@name='RoadType'])"/> <dateCreated>2000-11</dateCreated> </CityModel>
Note the intermingling of GML elements and attributes with those of the "base"
application--and note especially that structurally, GML markup is subordinate to the
markup in this simple document. This makes sense when you remember that "all" GML
does is to describe the features established by some other (usually graphic) application.
River element (in the unprefixed
http://www.opengis.net/examples namespace) is assigned a
gml:name element--a string identifying this squiggly line (that is, a graphic
representation of a river) as the river named "Cam."
This example also introduces the notion of features as that term is used in GIS contexts. A feature in this sense is a "thing in a landscape" which has one or more properties. And when you assign properties to geographic features, you end up with the fundamental building blocks of a coverage.
A coverage, in effect, is the GIS answer to an imperative of the form, "Show me all
[insert chosen features, or feature sets] which share [some property] in
common." Maybe you want to see all city-maintained roadways in a region (as opposed
those maintained by the state/province and the federal government). If each
Road element had an ownership property, say, then "city-maintained
roads" could constitute one coverage available from a data set such as this
When you think of a mapping application, it's natural under a scenario like the one
described to think that all city-maintained roads might be depicted in blue, all nationally
maintained ones in green, and so on. But GML says nothing at all about how a given
is to be depicted. The
gml:coordinate elements in the sample above do define
lines and planes but provide no instructions for representing--"drawing"--these
geometric objects. This depiction is the job of a graphic language, such as SVG. The
markup simply provides one facet of the source tree for a transformation to that resulting
A Commingled Specification
One thing which grabs you at once about the GML spec is its aggressive--one could almost say ebullient--embracing of other XML-based technologies, particularly XLink and XPointer but also Schematron for schema validation. This enthusiasm can be a little unnerving, but it's also exhilarating to see in action. (Sometimes angels rush in where fools fear to tread.)
As with other applications which incorporate XLink/XPointer within their frameworks,
uses those two standards to establish references to resources outside (or elsewhere
the current document. In many cases, a URI (the value of an
attribute) can be used in place of the string value of some description. Here's an
from the GML recommendation document. First, using a string:
<gml:direction> <gml:DirectionString>Towards the lighthouse</gml:DirectionString> </gml:direction>
And next, using an
<gml:direction> <gml:DirectionString xlink:href="http://my.big.org/logbook/20021127/paragraph6"/> </gml:direction>
Also interesting, almost surprising, is that the GML spec includes Schematron as a normative reference. (On the other hand, Schematron is, of course, popular and on track to become an ISO standard.) No Schematron markup would normally appear in a GML-based document instance, but it may be included as desired in GML-based schemas. As the GML spec says, "Some XML validators will process the Schematron constraints automatically. Otherwise, the Schematron code can be treated merely as a formal description of the required constraint." (In other words, you can use Schematron markup to describe GML descriptions of geographical features.)
The Human Face of GML
You can't get very far in learning about GML without finding one name popping up everywhere. Ron Lake, President and CEO of Galdos Systems, Inc., has been, as he told me in an e-mail exchange, "involved in geographic encoding and data sharing mechanisms for a long time [20 years]." In 1998, roughly concurrent with (and inspired by) the finalization of XML 1.0, Lake founded Galdos--a consulting and educational services company built almost entirely around what became known as GML.
I asked Lake when the light bulb went on over his head--that GML would be successful as a broadly supported medium for carrying rich information about geographic features. He identified several such moments; perhaps the most critical to GML's ultimate success was its adoption, in 2001, as a key component in the British Ordnance Survey's Digital National Framework (DNF). Currently, at the Ordnance Survey site, among other resources, you'll find an XSLT stylesheet for transforming their flavor of GML, called OS MasterMaps, into SVG. There's also a set of interactive SVG examples produced by applying XSLT transformations to GML source data.
Which brings up another issue: what kinds of processing applications exist--or soon will--for dealing with GML, alone or together with some "base" vocabulary?
One primary motivation for developing GML in the first place was to simplify delivery of maps over the web. Thus, you could be forgiven for thinking--as I did--that you could simply open a "GML document," perhaps with a browser plug-in like Adobe's SVG Viewer, and see some sort of interactive map in a form not unlike one of the proprietary GIS and CAD systems from such firms as ESRI, Intergraph, MapInfo, AutoDesk, and so on. This doesn't seem to be the case, though, except for small, relatively toy-like demonstrators (as on the Ordnance Survey examples site). Rather, application developers seem to be focusing (for now) on using GML as a common data interchange vehicle. Here are some links to developers, commercial and otherwise, who are in the thick of this activity (the list is illustrative, not exhaustive, and it's in addition to the GIS/CAD vendors mentioned earlier):
- SAFE Software: Their full product line supports GML version 2 (so far): "FME [Feature Manipulation Engine] Suite users can perform GML 2 translations and sophisticated processing tasks with point-and-click ease. FME Objects (which includes both API and OEM versions) provides a gateway that allows third-party applications to support GML 2 data. SpatialDirect allows organizations to distribute data in GML 2 format over the Internet."
- The SourceForge GML4J project: Originally developed by Galdos Systems, this open-source API provides a mechanism for "read[ing] GML documents and interpret[ing] the XML elements inside as features, geometries, their collections, feature properties, geometry properties, complex properties, coordinates or other GML constructs."
- Another SourceForge project, GeoTools (project home page): "The aim of the project is to develop a core set of Java objects in a framework which makes it easy for others to implement OGC [Open GIS Consortium]-compliant, server-side services or provide OGC compatibility in standalone applications or applets. The GeoTools 2 project comprises a core API of interfaces and default implementations of those interfaces."
- BBN Technologies' OpenMap: an open-source "Java Beans™ based toolkit for building applications and applets needing geographic information. Using OpenMap components, you can access data from legacy applications, in-place, in a distributed setting. At its core, OpenMap is a set of Swing components that understand geographic coordinates. These components help you show map data, and help you handle user input events to manipulate that data"
I asked Ron Lake about some of these applications; my question focused on performance issues such as those raised by an astute reader's comment to last month's column. (Essentially, the point is that text-based data sets in geographic applications tend to be big and slow to process, especially for real-time systems.)
He conceded that for most purposes, "I would expect that most GIS vendors will convert GML into their own internal format before they do anything with the data." He pointed out, though, that there are a couple of movements afoot that may make processing native GML-based easier. First, he said, there's the W3C flirtation (my word, not his) with so-called "binary XML." (There is some dissension about the utility of this effort.) Second, he pointed out, not everyone is waiting for the W3C to resolve its ambivalence about binary data in an XML--that is, text-based--world: "[C]ompanies like ExpWay (France) have developed binary codecs for XML which provide significant (much better than GZIP) reduction in memory footprint and CPU resource requirements."
The bottom line when it comes to the significance of GML--whether it remains text-based or not--may have been captured in something Lake said, almost as an aside, when I asked why he'd been drawn to what eventually became GML: "I have long felt that geographic information is only useful if it can all be integrated--that the problems in the world do not respect the boundaries we draw between disciplines and jurisdictions," which is a perfect sentiment, I think, to carry us into the new year.