Last Word on Last Call - The Specification's Problems
July 5, 2000
The over-arching issues appear to be ones of design complexity and obtuseness of expression, with some related issues of coordination among W3C WGs. At the same time, everyone understands not only the import of the spec, but the enormous pressure to produce a stable Recommendation with dispatch. If I could read between the lines of so many comments, it would say that schema is much too important to do wrong. While they want to see the work completed, they are "most anxious to see this fly." Everybody appreciates the difficulty of holding down the feature set when so many competing and overlapping interests are at stake. Yet there is a strong desire to see the WG observe the medical ethic of "first, do no harm"--meaning prune all controversial aspects of Schema 1.0 that are not essential to the most elementary requirements (to many, this would look like data types and XML syntax).
The most extensive description of the complexity issue is found in the W3C last-call comment archive itself, but there are several aspects that surface repeatedly there and in other comments. Among the issues raised are locally scoped namespaces, where multiple implementations are possible, and equivalence classes, where no added functionality above inheritance is seen, certainly not any that merits the complexity.
Philip Wadler of Bell Labs touches on several areas of potential simplification in his comments, which he prefaces in this manner:
The current Schema proposal is complex. Programmers have shown a remarkable ability to put up with complexity, but we do not yet know whether the XML community will be so forgiving. We would like to suggest that it is possible to greatly simplify XML Schema, while not unduly limiting its power. Indeed, some of the suggestions below would both simplify Schema and extend its power at the same time.
There seem to be some issues with coordination between the various WGs in the XML Activity, coordination with Query being primary. Wadler notes in his comment that the schema draft appears to be anticipating what Query will do, when it is not at all clear that Query will take the indicated path. He recommends dropping the issues from the draft until the efforts can be coordinated.
If the schema WG feels that it cannot on its own delay the release of the CR, the coordination issues may overtake it later rather than sooner.
Density of Expression
Martin Duerst of W3C says simply, in formal comments, "The verbal complexity of the XML Schema specs, in particular part 1, is extremely high." This, I’m afraid, is an understatement. This draft spec is a sitting duck for those who would take pot shots at the art of stringing together words into meaning.
From section 2.2, the abstract data model:
[Definition:] Several kinds of component have a target namespace, which is either absent or a namespace URI, also as defined by [XML-Namespaces]. The target namespace serves to identify the namespace within which the association between the component and its name exists. In the case of declarations, this in turn determines the namespace URI of, for example, the element information items it may validate.
There are no big words or fancy concepts here, but I could read it 30 times and still not know what it is trying to say. This is what I think it means:
Components may have a target namespace (TN). Like any namespace, a target namespace
- associates a component with a name.
- may or may not have a URI.
- can validate declared items.
I probably missed a great deal in translation, but what I’m left with is a clear statement that a target namespace is a namespace. Either the essential piece that defines target namespace is buried deeper than I could dig, or there is no essential piece here.
I stress the density of the writing because I know from painful, first-hand experience how hard it is to write about the domain that is closest to you. Anyone who has heard Henry Thompson speak knows that he communicates clearly, cleanly, and concisely. If the schema spec can be rewritten to eliminate the unnecessary fuzz, we may just find out that it is not such a tall order as it appears from within the clouds of formless prose.
Here is where the developer community seems to be thinking this could and should go from the last call point onward:
Plausible, and not unthinkable, outcome:
- issue CR pretty much as current, but possibly/hopefully taking out some of the complexity
- get feedback from CR that will confirm
- it solves some problems
- it would solve more problems if rewritten
- would be better yet if narrower in scope, simpler and re-written
- problems fixed before Rec is issued, meaning major changes after CR
- release CR data type spec first, with XML syntax for direct translation of DTD functionality, which rapidly becomes a Recommendation
- take time to look at all options for extending schema, including use of Extensibility’s schema extensions and cooperation with RELAX
- plow ahead with CR
- don’t receive intensive implementation and/or don’t heed it
- continue to Rec and then find implementers (vs. programmers and computer scientists) can’t read it and implement it
- Rec causes more problems than it fixes: deep and widespread disagreement on implementation strategies that leads to fragmentation of the schema languages supported.
Deemed less likely, but highly desirable outcome:
There is one way forward with support both from schema detractors and supporters alike: issue the datatypes spec, and a DTD for schemas in XML syntax that duplicates DTD functionality. Roger Costello of MITRE Corporation stated the case this way in his public comments:
Simplify the schema by making it open and moving the more complex features to a non-binding portion of the schema spec. The resulting simplified version of the XML Schema spec can then gradually evolve to incorporate the more complex features (if the market dictates).
Right now, W3C schemas and the schema draft enjoy universal support over any other schema proposal. In fact, it is a mark of the universal respect with which the editors and co-chairs are held that virtually no one with whom I spoke indicated any desire to replace the process with another, or to de-bunk the assumption that the W3C will produce the authoritative specification.
While no one we spoke with has made a public commitment to implementing Murata Makoto’s RELAX in place of W3C Schema, the relative simplicity of design and expression in the announcement made three months ago was a wake-up call for all involved. Not only is the RELAX spec brief, but it is perceived as powerful, and is on track for ISO standardization. The RELAX Core was published in May 2000 as JIS/TR X 0029:2000, and Murata expects a formal submission to ISO SC34 on behalf of JIS this September. Meanwhile, he is cutting back even further on the RELAX core, dropping, to his regret, some "interesting features," so that the base spec will be stable at the time of submission.
While there are no public statements from non-Japanese companies on RELAX adoption, there is credible talk that it is being looked at very seriously in a number of major technology houses in parallel with schema.
I asked several implementers about the impact of a multiple-schema-language world, and the consensus seemed to be that it would add a level of abstraction and complexity to implementation, but would present no insurmountable barrier. A single schema language world is preferable, as it will drive industry growth. Without a clear, unified direction on schemas, there is some fear that some will take a wait-and-see position and slow down implementation.
Sometimes heat and pressure makes a substance harder, and sometimes they make it mushy. The schema draft has been produced within a crucible, and there is some sentiment that it is sacrificing the very simplicity that gave XML legs.
The most powerful argument for simplification is that the penalty for over-simplification is much less severe than the penalty for getting it wrong. If not specified, users will implement their own inheritance models, after their own analysis and partner negotiation. This might not be such a bad thing, because it would permit experimentation and incremental design, and hopefully avoid the situation where a solution is canonized and subsequently ignored. The marketplace, often brutal and shortsighted, will have the last say here. What everyone does agree on is that a schema language free from individual vendor influence is the only non-negotiable requirement for the continued development of XML.