Going to Extremes

September 13, 2000

Liora Alschuler

Table of Contents

The Search for Intelligent Life

Approaches and Applications

The Free-For-All in Montreal

The Next Round

GCA's Extreme Markup Languages 2000, held in August in Montreal, was billed as "not frisbee, not skateboarding, not lounging -- just no-holds-barred tech talk," and so it was. It was the kind of conference where the code and ideas presented in the sessions were light years ahead of any commercial widgets that one can actually buy. Extreme Markup offered the best mix of geekdom and academia, sort of "geeks in tweed." One might be amused by attendees' quaint ways and love of abstraction, if it were not the same folks who built, promoted, and implemented SGML, XML, XSLT, and DOM, etc.

While it might seem a world away from business architectures to ponder whether "The Descriptive/Procedural Distinction is Flawed" [1], keep in mind that such digging has fueled the creation of markup language technology thus far. If we are to move further toward a truly interoperable, semantic Web, these tweedy techies are the "extremists" who will take us there. While there were many areas explored at the conference -- including general n-tiered XML architectures, end-user friendly tools for writers and designers, neat stuff to do with DTDs, groves, and the DOM, and XML-izing Eiffel -- the preponderance of papers and discussions zeroed in on one mission: the search for intelligence and meaning in markup.

The Search for Intelligent Life

XML has to date achieved a degree of syntactic, but not semantic, interoperability. On the Web, you still can't find what you need easily, and when you do, you may not know what it means. Grammars, that is, DTDs and schemas, don't supply meaning any more than the Elements of Style can tell you the size, shape, and location of a certain white whale. (The draft W3C schemas do include a type system, a necessary but not sufficient component of "meaning." W3C schemas figured remarkably little in discussion, although a pre-conference tutorial was well attended.)

As Jeff Heflin and James Hendler put it, "To achieve semantic interoperability, systems must be able to exchange data in such a way that the precise meaning of the data is readily accessible and the data itself can be translated by any system into a form that it understands."[2] The problem is that XML itself has, by design, no semantics. Or, as John Heintz and W. Eliot Kimber said, DTDs constrain syntax, not data models. They don't capture abstraction across models, they are simply an implementation view of a higher abstraction. [3]

In fact, the structural definition which XML can supply isn't even a universally adequate representation of the structure of text.[4] The ability to convey meaning through current markup applications was shredded systematically in "The Meaning and Interpretation of Markup" [5], which claimed that not only is the ordered hierarchy of content objects (OHCO) not sufficiently descriptive of structure, it is not an adequate rack on which to hang semantic vestments.

So what is an implementor to do?

Approaches and Applications

The conference program was rich in reports of real-world, large-scale implementations actively engaged in the search for meaning, and they were not all focused on Topic Maps or RDF -- although these specs (ISO and W3C respectively) were the most prevalent form of semantic representation addressed. One paper described the mapping to abstract data structures in the Perseus Project, where the semantic layer is used to manage a large digital library [6]. Another paper described an XML-based AI language for the web developed at the University of Maryland. [2]

Kimber and Heintz described in detail their use of UML as the semantic and structural constraint language for XML instances. Its advantages are that the model serves double duty as an ideal method to communicate with users and implementors and also by integrating the document model into system design. UML is not only a great GUI for a document model, the packaging mechanism allows modularization of document types. There is also the advantage of availability of tools and standard design methods.

Topic maps, ISO/IEC 13250 (see our report on Topic Maps from XML Europe earlier this year) remain a hot topic on the GCA conference circuit, with several papers describing Topic Map implementations.

Topic Maps separate semantic structure from data, so that a single map can apply to multiple resources, and multiple maps can be layered over a single resource. Maps can be contextually scoped and can, but need not, be associated with existing taxonomies through the 'facet' facility.

A paper by Helka Folch and Benoit Habert [7] explained how textual data analysis software operates on a corpus of 8 million words including book extracts, press releases, meeting minutes, and transcripts. The documents are first tagged down to the paragraph level, then analyzed for topics that can then be used for navigation by text mining software. The application creates classes of information "in opposition to the rest, not in terms of an absolute criteria." Hence, it can discover topics not previously classified. They then use Topic Maps to semantically tag the documents using labels chosen by humans from a "bag of words" supplied by the analysis.

Nikita Ogievetsky described how to build a dynamic web site with Topic Maps and XSLT, where the Topic Maps function as the site map[8]. In his implementation, every topic becomes a page, TM occurrences supply metadata, text, objects and images; occurrence role types determine XSLT rendering for referenced resources. Topic names become titles and links, and properties are used for natural language generation. The TM associations build the site map with recursive references and subject reuse. Such sites built on such Topic Maps can be merged.

Hans Holger Rath laid out a proposal for Topic Map templates, type hierarchies, association properties, inference rules, and consistency constraints that, in aggregate, create a schema for Topic Maps. [9] The proposal would provide a further link to existing semantics and hooks for text retrieval.

The Free-For-All in Montreal

Facing the conflict between Topic Maps and RDF head-on, the conference staged a debate between Eric "RDF" Miller of OCLC and Eric "Topic Maps" Freese of ISOGEN[10]. Freese and Miller provided this comparison between the two specs:

Similarities between RDF and Topic Maps

Both specifications

  • are hard to read
  • share a goal: to tie semantics to document structures
  • provide a systematic way to declare a vocabulary and basic integrity constraints
  • provide a typing system
  • provide entity relationships
  • both work well with established ontologies

The correspondences between the specs look something like this:

RDF Topic Maps
Resource Topics
RDF schema TM templates (proposed)
Properties Facets and association roles
URIs Topic identity, scope
Reification Association IDs

Differences between the two specifications

  • Topic Maps are not XML-specific and have so far been standardized for SGML only. The XML Topic Map activity under the GCA's IDEAlliance is drafting a proposal for such an implementation.
  • RDF is also not XML-specific, but to date has been implemented only in XML
  • RDF now has a schema specification which provides a standard way to express and link an ontology; such a schema is proposed for Topic Maps
  • RDF uses XML linking, Topic Maps use HyTime linking
  • Topic Maps have explicit scoping
  • Topic Maps start with the abstract layer and (optionally) link to resources; RDF starts at the resource layer and (optionally) creates an abstract layer

Modeling Topic Maps with RDF "loses the distinction between topics and resources," according to Freese.

In preparation for Montreal, he put out a call for suggestions on how to combine the two to end up with the best that each has to offer. Here are some of the suggestions:

  • consider topics as collections of resources (anchors) or links such that one object can be a link by a link interpreter and a topic by a Topic Map interpreter
  • add RDF's frame-based notation to Topic Maps to attach properties to resources
  • model RDF as a Topic Map application, gaining the scoping, merging, and inheritance mechanisms

David Dodds provided one view of an RDF/Topic Map alliance in his paper, "Simultaneous Topic Maps and RDF Metadata Structures in SVG." [11] In this application, he embedded Topic Map constructs in RDF metadata within SVG resources. With this notation, a graphics application would then know that a bar chart is a bar chart, and that each bar represents a certain scale and quantity. Since the RDF is embedded in a map, an external Topic Map processor can also manipulate the image.

Freese's example of the best of both worlds would look like this:

<topic xlink:type="extended"...etc...>

<resource xlink:type="locator" xlink:href="...etc..."


<dc:author>Dr Livingstone</author>





This example attaches a set of properties to a locator, which is a link. The topic could also be an RDF frame and, therefore, could contain any kind of property.

The reaction of the user community in Montreal was strong and unequivocal: merge the two or at least make them compatible. Among the desirable outcomes that were mentioned was a new syntax for RDF that would retain the graph notation but be less difficult to use.

The Next Round

Before SGML, there were GML and GenCode, which had no doctypes, only generalized instance syntax. Then SGML and XML provided a means to declare shared syntax for a type of document. There are several schema languages for XML that provide even stronger typing. Now we are seeing the binding of instances to even more powerful and more abstract models of information. As others have said, you always need one level of indirection more than you have.

As a result of Montreal, representatives from the W3C RDF activity and the XML Topic Maps group have committed to a series of unofficial joint teleconferences to begin this month. One of the first topics of discussion will be an RDF schema definition of a topic map. According to Freese and Miller, it looks like this is a viable and sound basis on which to open discussion on the possible convergence of the two specifications.

But as C. M. Sperberg-McQueen reminded us in his closing keynote, meaning is always in the instance. It would be reassuring to think that the Topic Map and RDF folks will hold this in mind as they convene their joint meetings and deliberate on the future of angle brackets with metadata. Reducing Tosca to a Topic Map, or a set of directed graphs, and calling the libretto "mere information," while calling the metadata schema "knowledge," misses a very large and important boat. Again, as Sperberg-McQueen put it, we should all "resist the temptation to be more meta than thou," and not lose sight of the importance of the instance itself.


Extreme Markup Papers

[1] The descriptive/procedural distinction is flawed
Allen Renear, Brown University

[2] Semantic interoperability on the Web
Jeff Heflin and James Hendler, both of University of Maryland

[3] Using UML to define XML document types
W. Eliot Kimber and John Heintz, both of DataChannel, Inc

[4] Markup's current imbalance
Paul Caton, Brown University

[5] Meaning and interpretation of markup
C. M. Sperberg-McQueen, W3C, MIT Laboratory for Computer Science, Claus Huitfeldt, University of Bergen, and Allen Renear, Brown University

[6] Management of XML documents in an integrated digital library
David Smith, Anne Mahoney, and Jeffrey A. Rydberg-Cox, Perseus Project

[7] Constructing a navigable Topic Map by inductive semantic acquisition methods
Helka Folch, Eléctricité de France, Benoit Habert

[8] Building dynamic Web sites with Topic Maps and XSLT
Nikita Ogievetsky, Cogitech Inc.

[9] Validating Topic Maps with constraints
Hans Holger Rath, STEP Electronic Publishing Solutions GmbH

[10] Topic Maps and RDF
Eric Freese, ISOGEN/DataChannel

RDF and Topic Maps
Eric Miller, Online Computer Library Center

[11] Simultaneous Topic Maps and RDF metadata structures in SVG
David Dodds, Open Text

Other presentations available online.