Wishful Thinking

January 5, 2000

Edd Dumbill

This week in XML-Deviant

Holiday Wishes for XML

Tidying up HTML

XMLIO C/C++ XML Processor

XML Instance to Schema Linking


A Tasty Bit of XML: the DESSERT DTD

XML-Deviant: Wishful Thinking

Welcome to the first installment of XML-Deviant, a weekly report from the XML developer mailing lists. As many developers simply don't have time to participate fully in the mailing lists, we will bring you weekly highlights from the cutting edge of XML. Our primary focus will be on the XML-DEV list, but we'll also report on the other XML discussion forums.

Holiday Wishes for XML

Reiterating a sentiment expressed in his closing keynote at XML'99, Peter Murray-Rust wrote that if he had one wish as departing moderator of XML-DEV, it would be to have a project to develop an open source editor for XML. He continued:

There is, anyway, a shortage of editors at present, and those that do exist are (not unreasonably) usually tied to a single author's point of view (e.g. streamed text, hierarchical content, etc.). As far as I know, none of them are easily extensible at API level, and those that do have APIs will differ enormously from each other.

The lack of an API for an editor effectively makes it impossible for people to develop a modular approach.

Murray-Rust went on to explain that application developers in domain-specific areas (such as his own, chemistry) did not want to be concerned with the generalities of editing, but instead wanted to concentrate on their domains. Hence the need for a modular editor that could be utilized in such situations.

Taking up the theme of wishful thinking, Simon St. Laurent opened a follow-up discussion, asking members of the XML-DEV list what their holiday wishes for XML were. He himself expressed an interest in seeing a workgroup XML-aware data store to manage and search his data:

I'd love to use WebDAV or some other open protocol to get information in and out of the repository, with support for things like versioning, fragment addressing through XPath and XPointer, easy replication and backup, and cross-platform capabilities.

Don Park, XML-DEV's resident deconstructionist, wished for "atomic" and "molecular" XML standards. He went on to explain that atomic standards defined a single "power word" in one page, giving "xmlns" or "table" as an example. Molecular standards are ones that define "power phrases," small schemas with a few elements in them. An example would be an "address" molecule, which contained the small number of elements constituting an address.

These atomic and molecular specifications would enable more understandable and reusable XML specifications, Park envisioned:

These "micro-standards" will allow us to create more coherent XML document standards as well as XML software that can "learn" to handle new standards by plugging in new power words or phrases.

Tidying up HTML

Also arising from the holiday wishes theme was Francis Norton's desire for "tools to bring real-life HTML into XML, so it could be manipulated with DOM and SAX."

John Cowan was able to make this wish come true by pointing out Dave Raggett's HTML Tidy utility for cleaning up HTML, which has an "as XML" option. Simon St. Laurent pitched back in again, saying that a version of Tidy that generated SAX events or DOM trees would be even more useful than the current document-to-document converter. Not to be outdone, John Cowan granted this wish too, pointing out a Java version of Tidy that provides a mini-DOM.

XMLIO 0.5 Announced

Paul Miller announced his XMLIO XML processing library. He describes XMLIO as "a lightweight library for generating and processing SML/XML streams in C or C++ applications where application data structures or configuration settings are stored as XML syntax."

The most interesting feature of XMLIO is its use of a combination of "push" and "pull" processing. Push processing means an application has data pushed at it by a parser (such as with SAX), whereas with pull processing, the application requests data from the parser. In Paul Miller's words:

The XML input processor (XML_Input and XML::Input) is a hybrid push/pull model where specific element handlers are specified for each level of a nested object hierarchy. When a desired element is encountered, a callback handler is called (like with a push model), but it is up to the handler to pull what it needs (either data or more subelements). If nothing is requested, the entire element is skipped. Because element processing is performed on the stack and unused elements are thrown away immediately, there is no memory overhead during processing, making it extremely efficient.

XML Schema Clarification

Andrew Layman from the W3C's XML Schema Working Group shed light on the new xsi:schemaLocation attribute used for linking document instances to their schemas:

After extensive debate, the XML Schemas WG decided that the xsi:schemaLocation attribute serves as a hint, not a mandatory directive. That is, the processor of an instance is welcome to look at the URI referenced by the value of xsi:schemaLocation, but is not required to.

Roger Costello found this disconcerting, and wrote:

I read these statements as saying that there is no standard way for specifying in an XML document what XML Schema it conforms to—every XML Parser will have its own way of doing things. Really???

Read on for further clarification—recommended reading for anyone following the development of this important specification.

Transfer of XML-DEV to OASIS

As mentioned previously in, the XML-DEV list is currently undergoing a transition from being hosted at London's Imperial College, to being hosted by OASIS. However, a few delays are being experienced, as reported by outgoing list maintainer Henry Rzepa. No new date has been given for the transfer so far—the move was originally to have happened in late December.

Recipes for XML

Winner of XML-Deviant's inventive acronym competition this week is the DESSERT DTD. Announced by Jim Saiya of FormatData, the Document Encoding and Structuring Specification for Electronic Recipe Transfer DTD enables "rich representation of recipe documents." Saiya writes that the DTD is intended to cater to a broad range of appetites:

It is simple enough for the posting of basic recipes by amateurs, and rich enough to satisfy the needs of traditional print publishers. The separation of content and style is stressed in the design of DESSERT—there are no appearance-specific tags, yet there are many "hooks" to act as stylesheet language selectors.

The authors, who can be contacted at, are looking for review and comments from the XML community on this DTD. This should provide some food for thought now that the holiday feasting season is over....