Menu

Developers' Day at XML 2000

December 5, 2000

Edd Dumbill

On the Monday of XML 2000, many developers attended the XML Developers' Special Interest Day. Chaired by Jon Bosak, this track was composed of "late breaking" XML developments -- speakers submitted proposals only six weeks before the start of the conference. This enabled some bleeding-edge work to be shown in the session, which was generally of very high quality.

Common XML

Simon St. Laurent explains Common XML
Simon St. Laurent explains Common XML

The session was opened by Simon St. Laurent, who presented Common XML, the first work product of the SML-DEV mailing list. The SML work grew out of a desire to rid XML of unnecessary and complicated constructions. While the more severe MinML hasn't found much favor with the wider community, the Common XML guidelines have received a warmer welcome.

Common XML is basically a "best practice" guide for ensuring maximum interoperability and longevity from your XML documents. Such guidelines include, for example, advice on the use of namespaces -- avoiding "tricks" like using the same prefix twice for different URIs within the same document. St. Laurent emphasised how important guidelines on namespaces in particular were, relating how at an earlier conference this year he'd had only 20 people turn up for what he expected would be a popular presentation -- Cross-Browser XML -- but 200 people for a presentation on the use of namespaces.

Jon Bosak, who chaired Developer's Day
Jon Bosak, who chaired Developer's Day

One very interesting part of St. Laurent's presentations was in fact the comments made by Jon Bosak at the end. Bosak, who instigated XML 1.0, revealed some interesting details about the development of the specification itself. He said that he shared St. Laurent's intuition that things could be simpler, and indeed the XML 1.0 Working Group shared this too. Unfortunately they got to a point where they couldn't agree on what else to omit from SGML.

Bosak observed that the result of several years of SGML best practice, the emergence of a de facto SGML subset named by Eliot Kimber as "Monastic SGML", lead to the development of XML. He cast Common XML as in the same mold, "Monastic XML." Bosak also endorsed the recommended practices on namespaces, observing that for most of their life in the WG, namespaces had in fact been processing instructions and therefore lacked most of the complicated consequences of scoping they have now. He also noted that external entitities were originally left out of XML, but were retained because of their use by editing tools -- the intent was, though, that they would always be resolved before document interchange.

RELAX

One of the high points of the Developers' Day was that several presentations gave an insight into the implementation of XML processing tools, rather than focusing merely on their specification or usage. Murata Makoto gave one such talk on the computation models used by verifiers for his RELAX XML schema language (see Learning to RELAX).

Murata Makoto encouraging the audience to RELAX
Murata Makoto encouraging the audience to RELAX

Murata-san first gave a demonstration of some of the tools available for RELAX, including a converter to and from DTDs, and RELAXER, which generates Java classes from RELAX schemas. Such classes, he explained, gave the programmer a "friendly" interface to documents, rather than using a general purpose DOM interface.

One of RELAX's distinguishing features is its basis in the computer science theory of tree (or hedge) automata. This enabled Murata-san to demonstrate the algorithms for verifying an XML instance. If a graph of the instance can be successfully labelled, according to the rules of the content model (representing by a finite state machine), then the instance is valid.

There are several alternative ways of approaching the labelling, either top-down or bottom-up, and optionally using backtracking. The audience heard the advantages and disadvantages of each -- whetherthey needed SAX or DOM, for instance. Murata-san reported that the best approach was indeed to utilize top-down and bottom-up simulataneously. This is the approach taken by the Java RELAX verifier.

There was a good deal of interest in RELAX from the audience. Although Murata-san in no way aggressively competes with the W3C XML Schema effort, (indeed he said that developers might like to use RELAX as a stepping stone to XML Schema -- as RELAX uses XML Schema datatypes it provides a route to using "DTDs with datatypes"), supporters of XML Schema are clearly discomfited by its emergence. RELAX continues to be a charmingly simple yet unexpectedly powerful force in the schema world.

Other Sessions

David Cleary presented the use of extension features in W3C XML Schema, assuring the audience that "even though XML Schema looks as if it has everything in there, there are actually things we had to say 'no' to." He demonstrated how non-native attributes in schemas enabled them to be tied to implementations, e.g. by annotating with correspondences to Java classes or SQL columns.

Steph Tryphonas of TellMe gave an entertaining demonstration of VoiceXML technology, hooking a phone up to the PA system. Showing the transition that TellMe's VoiceXML platform could make between VoiceXML-browsing and actual voice calls, he baffled a Chicago taxi firm by trying to order a taxi to Washington D.C. TellMe has an online development environment at studio.tellme.com, which enables developers to write VoiceXML applications and then test them by dialing a 1-800 number.

Tom Jenkins provided a valuable insight into the development of an XML-enabled distributed application he had developed for the US government. He had migrated from CORBA to XML with great success, reporting that from approaching XML "cold" it had taken him only two weeks to get his system up and running. He noted too that the migration from CORBA to XML for communication opened up more possibilities of functionality for his system, including such features as offline operation.

Alex Ceponkus of Bowstreet explained the new open source project Bowstreet has released, jUDDI (pron. "Judy"). jUDDI is a Java implementation of the new UDDI specification for business service registries.

In a fascinating talk, Dongwook Shin of the National Library of Medicine, explained the inner-workings of XML query engines. He presented the different ways indices could be generated for XML corpora, and their relative merits. Shin's query engine can be found at futurexpert.com.

Truly on the bleeding-edge of development, Dave Carlson of Ontogenics presented work he was doing on modeling XML schemas with UML. By taking the XML serialization of UML, XMI, as an intermediate format, Carlson was able to generate XML Schemas straight from UML by the application of an XSLT stylesheet. He showed off a web application which demonstrated this functionality. He has put a lot of effort into reverse-engineering UML diagrams from specifications such as UDDI; and he observed that if they were modeled diagrammatically in the first place it would aid understanding a great deal. His tool will shortly be available from XMLModeling.com.

The concluding two sessions of the day were mind-bending. Sam Hunting and Chris Cassidy of EComXML showed how Topic Maps could be used to map an e-business market, explaining that they solved such problems as the initial discovery of a business with which you wished to trade -- at a level higher than current "discovery" technologies like UDDI.

Walter Perry then explained a system he has created among diverse and disparate financial trading systems. Perry takes the opposite view from most e-business advocates in that he starts from the assumption that homogeneity of either messages or process is impossible. His viewpoints were controversial, but definitely presented food for thought on coping with electronically linking diverse entities.