An XML Apprenticeship

February 2, 2000

Leigh Dodds

Table of Contents

Seeking the Sacred Grove

Questioning the Oracle

Back to Earth: SAX2

I'm pleased to announce that this week Leigh Dodds takes over the reins of the XML-Deviant column. Leigh, who works with XML in electronic document delivery, maintains the Eclectic XML-DEV weblog, and is an active participant in the XML developer community.


Recent scrutiny of the W3C and its specifications showed no signs of abating this week. Once again, XML-DEV has provided a forum for some searching questions.

Seeking the Sacred Grove

Last week's XML-Deviant highlighted the complaint from the SGML camp about the neglect of the HyTime standard (ISO 10744). One concept from HyTime touted as having application in XML is that of "groves." Debate continued this week on why groves—a seemingly powerful model—have been largely ignored. Lars Marius Garshol suggested several reasons for our failure to understand groves:

  • lack of suitable presentational material
  • the fact that it comes from a different community with a different philosophy and terminology
  • the extreme suspicion with which many W3C (and other) people view anything coming out of ISO processes
  • the fact that groves and the DOM have different purposes

Several contributors pointed out the lack of a clear body of documentation for groves. The current "definitive" introduction to the subject is "Addressing the Enterprise: Why the Web needs Groves" by Paul Prescod. The consensus appears to be that groves are an essentially simple idea, yet their accessibility is hindered by their largely abstract expression in the current documentation. Norman Gray comments:

The flip-side of the Simplicity is that the standard is somewhat abstract (ahem!) and, like all grand projects, has its own universe of concepts. Also, I found it rather difficult to work out from the standard just what problem 10744 was trying to solve.

What problem was 10744 trying to solve? Len Bullard thought that taking a historical perspective might lead to a better understanding of the specification. He posed a set of questions in an attempt to unlock key points behind the groves' design rationale. Bullard proposed that looking at the history of the project might at least provide a means for evaluating its usefulness:

"If we can understand the decisions made during the decade-long design of ISO 10744, we can better understand the concepts because we will understand the problems it tries to solve before we try to understand how it solves them. It took ten years to make HyTime obscure. The W3C has beaten that speed record with the XML specs."

Is history repeating itself here? If HyTime requires a "Yoda session" with a "Markup Master" to be fully understood, what hope do the many XML Apprentices have in absorbing its content, with only the specification for guidance? The concern is that XML may be headed down a similar road to complexity. W3C specifications have been criticized for obscurity; those familiar with them can tell you about the subtle pitfalls one faces when trying to combine the specifications. The recent Slashdot discussion on the merits of XML, while colorful in places, provides a useful outside perspective on the debate. Nils Klarlund had cause to comment in a similar vein:

In my experience, undergraduate students must sometimes struggle quite a bit just to understand trees and their classical algorithms. The challenge is to convince the outside world ... that XML is an utterly practical interface to ideas that seemed too abstract in school.

Klarlund also suggested that the XML specification ought to be a "propaganda" document, introducing key concepts in as accessible a manner as possible.

Questioning the Oracle

Groves aside, the question of the competency of the W3C was again raised by Steven Newcomb. This time the complaint was the lack of an overall technical vision, and the problems inherent in the process of dividing work packages between independent technical committees.

The W3C process appears to be based on the naive belief that independent design assignments for all the various aspects of Web-based information interchange can be made to a plethora of independent technical committees, and, in the end, everything can be made to work together somehow.

Newcomb is very disturbed by the current W3C leadership, which he says "systematically ignored" the various "dazzlingly brilliant insights of James Clark" from HyTime. He also claims that the W3C is wasteful of members' contributions. He urged that the W3C should be made to answer several questions regarding how, in practice, the unification of specifications such as XLink, XML Schema, and XHTML should be achieved.

Tim Bray responded on the topic of the W3C coordination efforts, saying:

The W3C is well-supplied (some would say "infested") with Co-ordination Groups which exist precisely to detect cases where different working groups need to do extra work to ensure consistency and interoperability between their output. A tremendous amount of time and effort goes into this work.

Bray continued by noting that while the W3C processes may not be perfect, there isn't an obvious right way to tackle what he terms "the largest experiment in information processing ever attempted":

... at this point in history it seems obvious that of the 3 models of work we have before us (ISO, IETF, W3C), none can be said to have been shown to be either triumphant or bankrupt, based on results. The most likely conclusion is that they excel in different problem domains.

As Len Bullard commented, in many cases the same Masters are at work in each organization (the sterling work of James Clark within the W3C being a case in point). Attempting to address the issue of W3C vs. ISO vs. IETF on purely political grounds therefore seems rather inappropriate.

What is clear is that the path from XML Apprentice to Master is a difficult one; additional signposts are needed along the way, as well as maps to already discovered lands.

Back to Earth: SAX2

Setting aside loftier debates, great progress has been made with the news that a beta release of SAX2/Java is now available. This version is again the product of considerable discussion on XML-DEV and includes a number of new features, as announced by David Megginson:

  1. Namespace support.
  2. Configurability and extensibility through features and properties.
  3. A new interface and base class for SAX filters.
  4. Adapters for using SAX1 parsers with SAX2 and vice-versa.
  5. Way too much JavaDoc documentation.
  6. Public domain (even less restrictive than Open Source).

Although SAX/C++ is still some way off, it is likely to be based on SAX2, rather than on the original SAX API. The first proposal for additions to SAX2 is also already on the table. The development and beta release of SAX2 is certainly good news for all XML developers, Masters and Apprentices alike.