W3C XML Schema Needs You
March 27, 2002
The W3C XML Schema (XSD) specifications have drawn fire again recently, with a number of concerns being aired about an apparent lack of interoperability between implementations. Jonathan Robie, a member of the Schema Working Group, has issued a rallying cry for developers to unite and help push for interoperability.
A Call to Arms
There was a resurgence of the "XML Schema is too complex" debate on XML-DEV last week. While this is an oft debated topic, the issues have had a slightly different slant this time around with claims that XSD is so complex that it's proving extremely difficult to implement.
Paul Spencer reported on his experience of the difficulty of working with XML Schema, which apparently struck a chord with other XML-DEV members.
...I keep coming across schemas that validate in one tool, but not in another. Sometimes the schema is valid but wrongly described as invalid, other times it is invalid but the errors are not detected. In one case, I was asked to look at a schema, and a tool detected an error, but missed the identical error elsewhere in the same file. I have yet to find any tool that correctly identifies all errors without also indicating false errors. And this is on relatively simple schemas. ...
Although no hard figures have been circulated there is a general opinion that compatibility between implementations is far from satisfactory. A separate issue has been the conformance to the specifications. Anecdotal evidence suggests that of the independent implementations, Xerces-J and the Microsoft XML tools demonstrate the greatest conformance.
Spencer's concerns prompted Jonathan Robie to issue a call to arms to developers, encouraging them to help themselves by sharing information on which implementations offer the best support and to raise issues with bugs and poor conformance with the tool vendors. In short, Robie suggested that users "be loud - and speak directly to the people who are responsible for solving the problem".
In a later posting Robie also commented that not only is it in everyone's interest to get bugs and issues resolved, if people don't speak up then they can hardly complain that progress isn't being made:
If they don't have the time or inclination, I can't help them, but I also don't think they have any right to complain, since they aren't doing anything to help. Instead, they should be grateful to the people who *have* had the time and inclination to put work into the standards and products we use every day.
This may not be as harsh as it first sounds. After all it's effectively the retort used by open source developers when users complain that bugs are not being resolved quick enough. Although there is perhaps a difference in the perceived amount of involvement that a user has in the two environments.
Also in XML-Deviant
Where specification problems were concerned, Robie advised users to send comments to the www-xml-schema-comments list so that they'll reach the attention of the Working Group, noting that Working Group members are not required to monitor other lists. Comments made elsewhere are unlikely to incite a response. Patience is probably advisable in any case, as limited W3C resources means that responses might not be as quick as everyone might like.
The W3C Schema Test Collection was also highlighted by Robie, who encouraged developers to submit test cases of their own -- the Test Collection page notes that the Working Group only intends to create "a small part of the collection" -- as well as the application of pressure on implementors to publicly release their test results.
A Stick in the Eye
While Robie's suggestions are well made, many members of the XML-DEV list responded that it's a lack of clarity in the specifications that's the root cause of the problems. Adam Turoff suggested that XSD is "about as user-friendly as a stick in the eye". Mike Kay claimed that the specifications are too formal. Even Jonathan Robie had to admit that "XML Schema Part 1 is a real bear to read. I have a hard time reading it myself..."
Eric van der Vlist said that the specifications sit in an uncomfortable position between being prose and pseudo-code, which doesn't make them any easier to digest. Indeed van der Vlist observed that even the experts qualify their pronouncements because of this inherent fuzziness:
With W3C XML Schema we enter in[to] the world of fuzzy specifications. The best experts become humble and [preface] all their statements with "my understanding is" or "if I am not mistaken" and we see an almost religious shift to discussing interpretations of the spec rather than the spec itself!
Of course, the Schema Working Group has attempted to address this issue by producing the XML Schema Primer, in a response to early feedback on the readability of the specifications. The RDF Working Group has also adopted this approach, having recently published an RDF Primer, a welcome move in itself.
Posting to the schema comments list, Noah Mendelsohn asserted that the Working Group has been very concerned about the Schema specification; it's "struggled mightily and iterated many times to make it more accessible". Addressing the negative comments, Mendelsohn observed that it's easier to level criticism than it is to produce constructive improvements:
I mostly wanted to point out that it's easy to ask for a simpler presentation, much harder to show how to make it both accessible and rigorous. I assure you that being precise and rigorous is exceptionally important in a spec such as this. Given the complexity of the language, I think we've at least covered most details explicitly.
A few constructive suggestions were circulated during the discussion, some more radical than others. Rob Griffin suggested producing a list of standard error messages for validators, which ought to help achieve some level of consistency across implementations, as well as clarifying the circumstances in which each error should arise. Andrew Watt recommended the addition of a use case document that would provide an additional means of tackling the specifications. Watt pointed to the XML Query documents as a good exemplar.
Rick Jelliffe's suggestion to modularize XML Schema was the most radical. Jelliffe suggested that instead of a rewrite the schema specifications should be split into eight small sections which
...would allow greater modularity, let readers and implementers concentrate and advertise conformance on different parts, and fit in with ISO DSDL, for users who, say, want to use RELAX NG with XML Schemas primitive datatypes.
Jelliffe also commented that rather than criticizing XML Schema, the important first question should be to consider which schema language or combination of languages is most suited to a particular application domain. Jelliffe offered a prediction that document oriented systems will likely settle on DSDL, while database oriented applications will find XML Schemas most suitable.
Whatever scenario comes to pass, the conformance of schema validators and a high degree of compatibility between implementations are important goals. It would be unfortunate if XML documents end up being labeled as "Best validated with..."