Menu

Schemas by Example

March 28, 2001

Leigh Dodds

After a short break, the XML-Deviant returned to find that the W3C XML Schema specification has finally reached Proposed Recommendation and that work on innovative alternative schema languages continues.

Proposed Recommendation

The finish line is now in sight for the members of the W3C XML Schemas Working Group. The XML Schema specifications are an important step closer to completion with their promotion to Proposed Recommendation status. All that remains now is for Tim Berners-Lee, as Director of the W3C, to approve the specifications before they become full Recommendations.

The road has been long and hard, and it's had a number of difficult sections along the way. Rick Jelliffe, in a recent posting to XML-DEV, noted that the latest drafts have been greatly revised based on feedback, and he suggested the areas in which they are likely to see most significant use.

For clarity, there are now more examples, better overview and navigation material, fluff has been cut, the Primer provides a very nice introduction for the laity...and there [are] even a couple of pictures now.

I believe XML Schemas will be particularly useful for electronic commerce and business; especially for connecting to databases, automatically generating interfaces between processes, for optimized queries, and for XHTML.

Jelliffe, author of Schematron, also painted a picture of the future of XML schema efforts in which a number of alternative approaches would peacefully co-exist.

It is quite likely that there will be an XML Schemas 1.1. (Procedurally, it would not be decided to work on this until after the REC comes out and the Schema WG gets re-chartered, so no one can actually promise it.) Topics that one might expect to be examined for XML Schemas 1.1 include modularization, subsets, internationalization, date/time, keys, and cross-pollination from the 2000/2001 crop of schema languages (RELAX, TREX, Schematron, etc.). I am hopeful.

... there is nothing stopping anyone or any group making their own schema languages, and that innovation and competition (and marketplace "confusion") are not the enemies of XML Schemas (alternative schema languages provide new ideas for XML Schema's evolution; alternative schema languages allow people for whom XML Schemas is not appropriate to get their jobs done and therefore would decrease discontent; alternative schema languages, if orthogonal, are layers that can be used with XML Schemas, reducing the need for it to grow to cover more and more cases.) Open systems thrive by allowing organic plurality.

Following the promotion of W3C XML Schema, the W3C has also published the first official draft of a formal definition for XML Schema. Jonathan Robie announced its publication to XML-DEV, along with a description of its goals.

This formalization is a formal, declarative system for describing and naming XML Schema information, specifying XML instance type information, and validating instances against schemas. The goals of the formalization are to:

  • Provide a semantic framework for software systems that use the W3C XML Schema specification, such as the W3C XML Query Algebra.
  • Specify names for all components of an XML Schema, so that they can be uniquely identified by URIs. Such unique identifiers may be useful to XML Query, RDF, and topic maps, among others.
  • Formally define validation at a declarative level.
  • Define the mapping from the current XML Schema syntax onto the structures described here, as well as the mapping between the XML Schema component mode and our component model.

While a formal definition may not be directly interesting to many developers, its production is significant in that it may help efforts to optimize XML-based database applications, XML Query in particular. Developers seeking more immediate value from W3C XML Schema would do well to take a look at the best practices work conducted and documented by Roger Costello on XML-DEV. The document, now available as a freely downloadable book, contains more than 60 pages of schema design tips.

Fans of RDDL can feel particularly pleased as W3C XML Schema is the first W3C specification to provide a RDDL resource directory at its namespace URI.

Examplotron

Eric van der Vlist has been helping to realize Rick Jelliffe's vision of a plurality of schema languages by publishing Examplotron, a schema language without any elements.

Beating Hook, Rick Jelliffe's single element schema language has been quite a challenge, but I am happy to announce examplotron a schema language without any element[s].

Although examplotron does include an attribute, this attribute is optional and you can build quite a number of schemas without using it and I think it fair to say that examplotron is the most natural and easy to learn XML schema language defined up to now

Examplotron's innovation lies in its "schema by example" approach to schema generation. Rather than define a dedicated schema language with which a document can be described, Examplotron uses sample instance documents, annotated with several attributes that carry schema specific information such as occurrence of elements, and assertions about element and attribute content.

Like Schematron before it, Examplotron is implemented using XSLT. An Examplotron instance document can be converted into a validating stylesheet by applying a simple transformation.

Examplotron has generated interest among the XML-DEV community. For example, Murata Makoto has noted that with the addition of one operator Examplotron can be made as expressive as RELAX and TREX.

Namespace Versioning

Eric van der Vlist also made a useful proposal regarding the versioning of Namespace URIs, having wrestled with version changes during his development of Examplotron.

Versioning of Namespace URIs is a topic that XML-DEV has struggled with on a number of occasions, and one which it has recently revisited. However, generally speaking, practical proposals have been lacking. In most cases the discussion has devolved into a theoretical discussion about the differences between a URI and the entity that it references. Van der Vlist made a useful suggestion that can be of immediate practical value to developers, beginning with an illustration of the problem.

It appears to be a common (and useful) usage to use namespace URIs to carry **some** level of information about the release of a vocabulary to the applications.

On the other hand, changing a namespace URI means that older applications will not recognize the vocabulary at all, breaking any possibility of compatibility.

For instance, if XSLT transformation has defined a template to match {http://examplotron.org/0.0.0.1.a}:foo, this template will not match {http://examplotron.org/0.0.0.1.b}:foo and would need to be duplicated to take both versions into account.

The proposal revolves around maintaining URLs for each major and minor release of a vocabulary, e.g. http://examplotron.org/0/, http://examplotron.org/1/1. Simple URL rewriting rules can then be used to re-direct a user or application to the appropriate current URL. Van der Vlist includes some useful examples in his proposal, and he's implemented the scheme for Examplotron.

Developers producing documents using multiple namespaces may benefit from taking this approach, which is related to how the W3C manages its namespace URIs and URLs between versions of specifications. It benefits from a simplicity that parallels RDDL and the two could make a powerful combination.