Spotlight on Schemas

February 23, 2000

Leigh Dodds

Table of Contents

Your Schema or Mine?
Wielding Occam's Razor
Checking the Foundations
XTech 2000

The XML Schema specification, the W3C's upcoming replacement for DTDs, has become a recurrent topic on XML-DEV. Numerous questions about its features have been explored, and some concerns raised about the readability of the Working Draft. With the prospect of a new draft appearing shortly, this week's XML-Deviant takes a look at some of the main issues under debate.

Your Schema or Mine?

A browse through the W3C Technical Reports and Publications uncovers several proposals defining schema extensions for XML. Many more exist outside the W3C process, including Document Structure Definition and Schematron. Tool support for these alternative proposals is varied, and often limited to those of the tool vendor(s) contributing to the proposal. However, the official W3C schema effort itself is not yet complete either, and currently has little in the way of tool support to back it up.

Observing this fragmentation, and noting that projects are currently being founded on these "unofficial" schema variants, Roger Costello wondered:

  • Has XML Schemas already missed the window of opportunity to be the schema language to supplant DTDs?
  • Is the world going to be fragmented with a bunch of different schema language dialects?
  • Is fragmentation a bad thing or a good thing?
  • Perhaps the XML Schema WG needs to do some marketing?

The questions prompted a lengthy discussion. Henry Thompson, co-editor of the W3C XML Schema specification, stated that "marketing" of XML Schemas would not occur until there was a stable product to market. This standpoint indicates that the current focus of the Working Group is to drive the specification through to Recommendation status. Concerns over its adoption will be addressed at that time. Encouragingly, feedback from vendors was promising. For example, Andrew Layman noted that Microsoft supports Schemas as an upgrade path for XDR. Edd Dumbill observed:

I don't perceive that there will be much difficulty in persuading vendors to implement XML Schemas, and the experience being gathered now with SOX & XDR provides useful guidance for the W3C schema effort itself.

Bill la Forge, the author of Quick, a Java/XML Schema binding tool, saw diversity as essential:

...I think it appropriate that we have diverse schema. We don't know what we are doing yet. I think you will see more schema before the shakeout occurs. And we will likely end up with at least two, one being a superset of the other.

Right now, I think the important thing is to stay focused on the markup language and remember that schema are part of the implementation. And there should be room for more than one implementation.

Wielding Occam's Razor

Simplicity (or lack of it) in a specification can depend on your viewpoint: user or implementor. Is it easy to use? Is it easy to implement? These are two questions that can have very different answers. The current Schema draft doesn't look too good from either side. Dave Hollander, mentioning that a Schema subset might be useful, received support from a number of contributors. Bill la Forge commented:

Such a subset could have tremendous utility, both for simplified implementations and as a means of learning the larger schema.

I think it should be possible, though I have no idea of the effort required, to define a standard subset roughly equivalent to DTDs. But I suspect such a standard would have great value.

Paul Grosso went further by suggesting that a first cut of XML Schemas should have been based on

"...DTD capabilities in XML instance syntax plus some basic data types." Then not only would it have been available long ago, but vendors and users could benefit from the experience, and Schema 2.0 could be developed in light of that experience.

A subset of XML Schemas might provide a helpful migration path for users who wish to move from DTDs to Schemas. Addressing implementation complexity, Curt Arnold proposed a "validation subset":

It is obvious that there are definitely a lot of features that make authoring easy (such as imports, includes, type inheritance, equivClasses) that should be "compiled" or "preprocessed" out before so that every validation attempt doesn't have to repeat the effort of fetching every imported schema, etc.

I've suggested an open-source effort to create XSLT transforms that perform this translation from the authoring schema to the validation subset and have gotten a few volunteers.

These simplification attempts are not mutually exclusive, and could go some way to reducing the potential complexity of XML Schemas. The wide array of features that the XML Schema spec has to offer make it difficult to implement or understand in its current form.

Checking the Foundations

Stefan Haustein, responding to a challenge to provide an example of why XML Schema syntax is too complex, highlighted a lack of understanding of key design decisions. While a full analysis of the technical aspects is outside the scope of XML-Deviant, the main issue is this: without suitable illustration of the benefits of some features, it is hard to understand why they are provided or even required.

The impending release of a new draft of the specification has limited the input from the Schema Working Group into the discussion. Henry Thompson cited time constraints as a limiting factor on his own further involvement in the debate:

As one of the editors, I've been trying to respond to all queries for elucidation about design issues or interpretation, and to engage in debate about perceived weaknesses or gaps in coverage.

I personally will no longer be able to do this and do my job as editor during the coming weeks as we try to get a final draft ready for internal W3C Last Call review and publication of a Candidate Recommendation.

Michael Champion pointed out how objections and issues could be raised within the W3C process:

Wait until the next draft (which I have heard has taken pains in the direction of clarity and simplicity); if it is too complex, then make clear, specific recommendations as to how to simplify it to this list, the Schema public comment list, your company's W3C participants (if applicable), your partners/suppliers who are W3C participants, etc.

Champion hinted that SML-DEV would look at a way to produce a subset of Schemas if the final Recommendation was deemed too complex. The SML-DEV mailing list is for developers seeking a simplification of XML and its associated tools and specifications.

Matthew Gertner was not content to wait for the next draft, and expressed concern about talk of simplification before the specification has even reached Recommendation. He urged a member of the Schema Working Group to make a public statement regarding the current issues:

What we have seen is that people on this list who are struggling to implement XML Schema (and I would add my voice strongly to this group) are seeking an explanation of why certain design decisions were made. It's hard to have any discussion about whether these decisions should be reversed without this information. We can all hold tight until the next spec release, I'm sure, before making concrete suggestions for changes. But I don't see any reason why someone in the WG can't make a public statement about the practical implications of the issues that have been raised.

This debate is obviously far from over. Significant time will have to be allotted for implementation of the specification and proper exploration of its features. There may problems hidden beneath the surface. Nils Klarlund, frustrated by the wording of the specification, advocated waiting for the next Working Draft before exploring further:

The writing is not necessarily the main problem. It may just obscure an underlying substantial problem: mind-boggling technical complexity that can't be explained away. I'm just guessing here, since I've not quite been able to penetrate the language barrier yet. And the promise of a more readable version makes me think I'm just wasting my time trying to digest the current one.

If the next Working Draft lives up to its promise of clarity, then a significant first step will have been made towards its wider acceptance.

XTech 2000

The XTech 2000 conference opens in San Jose on February 27th. The conference will provide another forum for developers to air their opinions: "Town Hall" meetings. These events will place a number of W3C members and other community luminaries in the hot seat before an audience of developers. The XML Schema meeting, on February 29th, ought to be particularly lively!

Next week XML-Deviant will come from XTech 2000.