Menu

Bad Language

January 26, 2000

Edd Dumbill

Table of Contents

Language Barriers
Reinventing the Wheel?
XML-DEV as Scholarly Journal
Pigs and XML

The last week has seen the residents of XML-DEV in an agitated frame of mind, with plenty of argument in and around matters XML. Several somewhat peripheral debates also took place, especially over web browsers and the recent Geoworks/WAP intellectual property issue.

Several list members were keen to move the discussion back to matters more "on-topic" for the XML developer's mailing list, yet the political motif couldn't quite be shaken this week.

XML Information Set Language "Impenetrable"

Nils Klarlund brought the last-call W3C Working Draft "XML Information Set" (which describes the information available in a well-formed XML document) to the attention of XML-DEV. In his post he observed that although this WD hadn't caused much discussion in the group, he saw a major cause for worry:

The draft solves the problem: what is the mathematical object that an XML document represents? The answer is a "tree" defined according to some rather natural rules.

Although the document is cleanly written, it introduces some very cumbersome language, which will almost guarantee that future XML specifications will become unnecessarily hard to read (just look at the current XML Schema document, part 1).

He continued to urge the XML community to encourage the WD authors to use language that "mortals would understand," such as "nodes" and "trees." Klarlund included his comments sent to the W3C Infoset Working Group, requesting they "Please put stakes through verbiage like 'XML element information item.'" He also saw no need for two different tree models of XML: that provided by the DOM, and that provided by XML information sets. Of which, more later.

Michael Champion picked up the language issue and reported a discussion about the quality of writing in W3C specs, which he was involved in at a recent W3C Working Group meeting. As with last week's difficulties with the XML 1.0 errata, the issue of W3C resources seems to have a bearing on this matter.

That particular group generally agreed that the specs need to be written more clearly, and the plan (I don't know if anything came of it) was to lobby our W3C Advisory Committee representatives to push the W3C to devote more of its admittedly scarce resources to hiring staff technical writers, or getting members to contribute the time of technical writers as well as XML specialists to the activities of the working groups they participate on.

Thomas Passin suggested two ways of improving the standard of the specifications. The first of these was peer review, although Passin admitted that this wasn't an ideal situation:

Trouble with peer review is, who could you get, especially without pay, who isn't already involved? Trouble with discussion groups is, a lot of the people responding are either not knowledgeable enough or don't read the material closely enough. Still, where there is widespread misunderstanding, the material probably needs rewriting.

He also mentioned the usefulness of independent implementations, adding "There's nothing like trying to build to a spec to uncover its lack of clarity." Tim Bray, co-editor of the XML 1.0 spec, was able to supplement Passin's comments with some of his W3C experiences. Bray pointed out that contributors to a review could in fact be "too knowledgeable," and could focus more on the issues rather than the language of the specification. He also re-iterated the W3C's commitment to reference implementations, as demonstrated by its recent adoption of the Candidate Recommendation stage for specifications.

Returning to the issue of the Infoset Working Draft, there rather seems a connection between Passin's observation that few have commented on the specification, and that it may be difficult to understand—who has actually read this specification? This is one overriding incentive for the W3C to be as clear and concise as possible: such lucidity will make a great contribution to a specification's breadth of acceptance, review, and implementation by the developer community, which has little time to penetrate complex language.

W3C Doomed to Re-Invent, Badly?

All of which leads us on to this week's thunderstorm. As mentioned above, Nils Klarlund is worried by the existence of multiple models of XML, as opposed to a "universal and simple model of trees." In a further post he details what he sees as the major conflicts between the models in the DOM2, XPath, Infoset, XML 1.0, and XML Schema specifications.

This is a big mess.... I'll outline a modest simplification that affects several of the (draft) recommendations with the result that there is one model and one terminology.... My simplification is certainly not the only way to go about these fundamental problems, but I hope that they'll show that they are solvable.

Klarlund's post unleashed a furious response from Steven Newcomb (which unfortunately missed the official web archives of XML-DEV, but can be found here on eGroups.com). Newcomb's key point is that the confusion that Klarlund pointed out is a problem that is solved in ISO/IEC 10744:1997 (the HyTime standard, of which Newcomb was a co-editor: see also HyTime.org).

...excellent people like Nils Klarlund still don't know about it [ISO 10744], or, if they do, they choose to ignore it. Nils's analysis of this situation fails to even mention groves and property sets, much less compare their carefully balanced elegance with the ongoing W3C design chaos that he so deplores.

Voicing thoughts familiar to those among XML developers with an extensive SGML history, Newcomb bemoans the stigma that concepts such as architectural forms have attracted. He also complains about the failure to properly implement such concepts even when they have been adopted.

Meanwhile the W3C insists on reinventing everything, badly, in a state of profound confusion about what they're doing, with predictable results. When ISO 10744 concepts are adopted by the W3C, as has happened several times now, the underlying concepts that keep all the concepts working well together and in balance with each other are not adopted [with] predictable results.

Newcomb continues, pointing to the structure of the W3C as a root cause of the "chaotic technical situation." Reading the full post is the best way to follow his argument.

Setting aside the dispute over the structure and control of the W3C, one ought to ask why HyTime has not been more popularly received and adopted. On this subject, Michael Champion believes that the ISO 10744 standard may be obstructively difficult to understand:

I can say with some confidence that "they" don't get it. I don't get it. The DOM WG has wrestled with the ISO stuff and took out of it all that we thought we could use. The SML-DEV people wrestled with the ISO stuff from their fresh start at all this and don't get it either. A lot of people who are not stupid or lazy or subject to anyone's irrational phobias have tried and failed to understand what you see in the ISO specs that would lead you to believe that they have solved in a useful way the problems we wrestle with.

Champion ends with a challenge to produce a "clarified, unified, readable exposition of 'the ISO stuff'," which he says would "be given fair consideration by a lot of people." That is a point applicable to both HyTime advocates and the W3C: if a specification cannot be penetrated by intelligent implementors in a reasonable amount of time, it bodes poorly for its future adoption.

XML-DEV as a Scholarly Journal

Warming to the theme of XML-DEV as a forum for peer-review, Peter Murray-Rust (the founder of XML-DEV) pointed out the similarities between XML-DEV and scholarly journals in the Science/Technical/Medical (STM) area:

Apart from the vicious circle of publishing to gain funding, STM publications have the following roles:

  • communication to the community
  • establishing priority of ideas or expressions
  • opening one's work to peer review
  • building a sense of community
  • formally depositing re-usable material (data, code, etc.) in the public domain
  • acting as a historical record

In all those areas, XML-DEV functions well (and often better) than traditional methods.

Murray-Rust continues, observing that many academics now choose not to publish in commercial journals, and that he sees XML-DEV as an instance of a "new publication type."

I think we have an opportunity here. Is there a role for (say) XML-DEV whitepapers? SAX, XSchema, DDML, and SML could fall into this category. They don't necessarily have to be "successful"—in that they get adopted—but they have to be seen as competent and innovative.
... Are there ways forward worth developing as part of the move to OASIS?

Thomas Passin gave the idea a cautious welcome, adding that such papers would need to be short and well-defined, otherwise it would be difficult to get them finished. Didier Martin, no stranger to posting inventive white-paper-like messages to XML-DEV, was more enthusiastic and went straight ahead offering a first contribution. His paper is on the use of XLink to create tables of contents:

Here is my modest first contribution to the XML-DEV editorial content. Maybe OASIS can also keep a copy of the XML-DEV editorial content.
http://www.netfolder.com/xlink/TOC_pattern.htm
I will update the document based on comments.

It remains to be seen if OASIS will pick up the challenge to host such documents. However, this idea, together with last week's rumblings about an "Alternative W3C," strengthens the point that XML-DEV is home to much talent, not all of which is finding a home and expression through the W3C and other existing vendor-centric bodies.

Making a Silk Purse...

After the week's many heated deliberations, let's turn to a lighter note contributed by Ken North, considering applications of XML in business:

Software Development magazine (February 2000 issue) is running an interesting agribusiness case study—using XML to publish university agricultural research.

"XML and Pig Poop: Agribusiness Online" by Rick Wayne, Univ. of Wisconsin.

"How's that for real-world XML?" asks North. How, indeed? Now and then we do well to remember the end-users of the specifications generated by our toil.