Going to Extremes
|
Table of Contents |
|
The Search for Intelligent Life |
GCA's Extreme Markup Languages 2000, held in August in Montreal, was billed as "not frisbee, not skateboarding, not lounging -- just no-holds-barred tech talk," and so it was. It was the kind of conference where the code and ideas presented in the sessions were light years ahead of any commercial widgets that one can actually buy. Extreme Markup offered the best mix of geekdom and academia, sort of "geeks in tweed." One might be amused by attendees' quaint ways and love of abstraction, if it were not the same folks who built, promoted, and implemented SGML, XML, XSLT, and DOM, etc.
While it might seem a world away from business architectures to ponder whether "The Descriptive/Procedural Distinction is Flawed" [1], keep in mind that such digging has fueled the creation of markup language technology thus far. If we are to move further toward a truly interoperable, semantic Web, these tweedy techies are the "extremists" who will take us there. While there were many areas explored at the conference -- including general n-tiered XML architectures, end-user friendly tools for writers and designers, neat stuff to do with DTDs, groves, and the DOM, and XML-izing Eiffel -- the preponderance of papers and discussions zeroed in on one mission: the search for intelligence and meaning in markup.
The Search for Intelligent Life
XML has to date achieved a degree of syntactic, but not semantic, interoperability. On the Web, you still can't find what you need easily, and when you do, you may not know what it means. Grammars, that is, DTDs and schemas, don't supply meaning any more than the Elements of Style can tell you the size, shape, and location of a certain white whale. (The draft W3C schemas do include a type system, a necessary but not sufficient component of "meaning." W3C schemas figured remarkably little in discussion, although a pre-conference tutorial was well attended.)
As Jeff Heflin and James Hendler put it, "To achieve semantic interoperability, systems must be able to exchange data in such a way that the precise meaning of the data is readily accessible and the data itself can be translated by any system into a form that it understands."[2] The problem is that XML itself has, by design, no semantics. Or, as John Heintz and W. Eliot Kimber said, DTDs constrain syntax, not data models. They don't capture abstraction across models, they are simply an implementation view of a higher abstraction. [3]
In fact, the structural definition which XML can supply isn't even a universally adequate representation of the structure of text.[4] The ability to convey meaning through current markup applications was shredded systematically in "The Meaning and Interpretation of Markup" [5], which claimed that not only is the ordered hierarchy of content objects (OHCO) not sufficiently descriptive of structure, it is not an adequate rack on which to hang semantic vestments.
So what is an implementor to do?
Approaches and Applications
The conference program was rich in reports of real-world, large-scale implementations actively engaged in the search for meaning, and they were not all focused on Topic Maps or RDF -- although these specs (ISO and W3C respectively) were the most prevalent form of semantic representation addressed. One paper described the mapping to abstract data structures in the Perseus Project, where the semantic layer is used to manage a large digital library [6]. Another paper described an XML-based AI language for the web developed at the University of Maryland. [2]
Kimber and Heintz described in detail their use of UML as the semantic and structural constraint language for XML instances. Its advantages are that the model serves double duty as an ideal method to communicate with users and implementors and also by integrating the document model into system design. UML is not only a great GUI for a document model, the packaging mechanism allows modularization of document types. There is also the advantage of availability of tools and standard design methods.
Topic maps, ISO/IEC 13250 (see our report on Topic Maps from XML Europe earlier this year) remain a hot topic on the GCA conference circuit, with several papers describing Topic Map implementations.
Topic Maps separate semantic structure from data, so that a single map can apply to multiple resources, and multiple maps can be layered over a single resource. Maps can be contextually scoped and can, but need not, be associated with existing taxonomies through the 'facet' facility.
A paper by Helka Folch and Benoit Habert [7] explained how textual data analysis software operates on a corpus of 8 million words including book extracts, press releases, meeting minutes, and transcripts. The documents are first tagged down to the paragraph level, then analyzed for topics that can then be used for navigation by text mining software. The application creates classes of information "in opposition to the rest, not in terms of an absolute criteria." Hence, it can discover topics not previously classified. They then use Topic Maps to semantically tag the documents using labels chosen by humans from a "bag of words" supplied by the analysis.
Nikita Ogievetsky described how to build a dynamic web site with Topic Maps and XSLT, where the Topic Maps function as the site map[8]. In his implementation, every topic becomes a page, TM occurrences supply metadata, text, objects and images; occurrence role types determine XSLT rendering for referenced resources. Topic names become titles and links, and properties are used for natural language generation. The TM associations build the site map with recursive references and subject reuse. Such sites built on such Topic Maps can be merged.
Hans Holger Rath laid out a proposal for Topic Map templates, type hierarchies, association properties, inference rules, and consistency constraints that, in aggregate, create a schema for Topic Maps. [9] The proposal would provide a further link to existing semantics and hooks for text retrieval.
Pages: 1, 2 |