Extreme Markup 2004

September 15, 2004

James Mason

For the fifth year running, the annual family reunion for connoisseurs of structured documents, Extreme Markup Languages, gathered in Montréal. For the sessions that lasted Aug. 2-6, Tommie Usdin (Mulberry Technologies) chaired, assisted by Debbie Lapeyre (Mulberry Technologies), Steve Newcomb (Coolheads Consulting), Michael Sperberg-McQueen (W3C), and myself.

What brings us back to Montréal again? It wasn't just the hotel (which was strange but offered free wireless connectivity). What we like most at Extreme is the opportunity for networking, controversy, and intellectual challenges. From Usdin's opening keynote, "Don't Pull Up the Ladder Behind You," to Sperberg-McQueen's "Runways, Product Differentiation, Snap-Together Joints, Airplane Glue, and Switches that Really Switch," the latest edition of his eagerly awaited annual wrap-up, the focus was once again on what makes markup work and how we can stretch its limits.

Markup theory and revisiting the basic functions of markup languages is always a feature of these conferences. Mathematical approaches to markup languages appeared in several papers: Anne Brüggemann-Klein (Technische Universität München) and Derick Wood (Hong Kong University of Science and Technology), in "Balanced Context-Free Grammars, Hedge Grammars and Pushdown Caterpillar Automata" looked at the close relationship between Dyck strings and hedges, hedge grammars being familiar to many XML users because of their use as the basis for RELAX NG (ISO/IEC 19757-2). Stephan Kepser (University of Tübingen) offered "A Simple Proof for the Turing-Completeness of XSLT and XQuery." Continuing a theme that he had pursued in past years -- "What we may have lost in the transition from SGML to XML" -- Simon St. Laurent examined "General Parsed Entities: Unfinished Business." Fabio Vitali and his group at the University of Bologna revisited the subject of DTD++, a mechanism for extending the traditional schema language to include most of the power of W3 Schema.

One particular theme related to markup theory that has run through several conferences over the years has been markup overlap and the treatment of multiple hierarchies. This year it emerged as a special track. Andreas Witt (Bielefeld University) discussed "New Aspects of an Old Solution," an approach that employed synchronized files in XML and Prolog. In "Tabling the Overlap Discussion," Patrick Durusau (Society of Biblical Literature) and Matthew O'Donnell ( proposed a solution based on data structures derived from relational databases. Steve DeRose (represented at the last minute by Tommie Usdin) in his paper "Markup Overlap: A Review and a Horse" demonstrated a stealthy means of handling overlapping markup by means of "Trojan milestones," element tags that can be used either as empty tags or conventional ones. Wendell Piez, back by popular demand, presented "H>alf-Steps toward LMNL," a continuation of a paper from 2002, now showing how the CLIX mechanism from DeRose's paper could be used to implement some of the goals of LMNL without resorting to stand-off markup.

As has been the case at many recent conferences, Topic Maps and RDF attracted enough papers for a virtual track. This year's Topic Map papers tended toward application studies. The paper showing the most generalized application was Steve Pepper's "Seamless Knowledge: Spontaneous Knowledge Federation using Topic Maps," a study of how diverse topic-map-based public portals can be unified, despite widely divergent ontologies and schemas. Tom Insalaco and I presented "Navigating the Production Maze: The Topic-Mapped Enterprise," in which we showed that a topic map can help track components through complex manufacturing processes. Terrence Brady (LexisNexis) traced his attempts to model complex software systems with various tools before deciding on "Representing Software System Information in a Topic Map." Thomas Passin (Mitretek Systems) had, as he frequently does, a novel approach to information management, in this case generating RDF through on-the-fly clustering. Michael Sperberg-McQueen and Eric Miller, both of the W3C, stirred up some discussion with their presentation "On Mapping from Colloquial XML to RDF Using XSLT."

Among the most extreme of the papers in this track was one by Brian Thompson, Graham Moore, Bijan Parsia, and Bradley R. Bebee, on "Scalable Document-Centric Addressing of Semantic Stores Using the XPointer Framework and the REST Architectural Style." (Note: A series of articles covering the ideas presented in this paper will appear on this fall.)

The TEI (Text Encoding Initiative) is an area of interest to many participants in Extreme. This year, Lou Burnard and Sebastian Rahtz (Oxford University) presented a description of how the TEI is simultaneously moving from DTDs to RELAX NG and beginning to use literate programming techniques to document its schemas: "Relaxing with Son of ODD, or What the TEI Did Next." Syd Bauman and Julia Flanders (Brown University) continued the TEI theme with "ODD Customizations." (ODD stands for "One Document Does it all," which might be called the primary theme of literate programming: in this case, one document contains both the schema and its documentation.) A paper that united a TEI base with some of the themes heard earlier in the conference, notably automating document interpretation/metadata generation and markup overlays, was "Interpretation beyond Markup" by David Dubin (University of Illinois) and David Birnbaum (University of Pittsburgh), which presented techniques for analysis of Russian poetic forms.

Because much of the conference was in double tracks, I was able to hear only about half the papers. However, as a member of the conference committee, I had reviewed all of them. Like the other members of the committee, I was favorably impressed by the overall quality of the submissions this year, even the ones for which we could not make room in the program. One trend that we noticed this year is that Extreme is becoming more widely recognized in the academic-computing community. We welcome the attention from both students and faculty as a sign that the conference includes an even wider range of participants.

Current plans are for the conference to return to Montréal again next August.

The preliminary proceedings for Extreme Markup Languages 2004 (along with proceedings from previous years) are online at


-- Jim Mason

Y-12 National Security Complex