Schemas in the Wild

September 27, 2000

Leigh Dodds

This week the XML-Deviant revisits the topic of schemas. As the W3C XML Schemas Working Group is making final strides toward releasing XML Schemas as a Candidate Recommendation, the XML community is exploring best practice in schema design.

Evolving Schemas

In one of his occasional postings to XML-DEV, Robin Cover recently observed that XML Schemas form the basis for two large scale projects. The first is DIG35, a metadata standard for describing digital images; the second is UDDI, or "Universal Description, Discovery, and Integration" a registry for online business services. Cover noted that

these are not lightweight or trivial XML applications.

Observing that the XML Schema specification editors have been evolving the specification in line with public feedback, Rick Jelliffe believed that it is natural for schemas to see increasing usage, but a hard question still has to be answered:

I don't think it is surprising that the people who have financed XML Schema's development should be keen to use it ( seems to have many of the same key players as are in the W3C). I don't think anyone is claiming that the recent XML Schema drafts are useless. But that does not really speak about whether XML Schemas attempts "too much": that is an unresolvable question because XML has expanded to be more than just the simple data-interchange format. Now it is a Nutty Professor II-style family of technologies that are being used to model/process the databases as well as the reports.

The time left to consider whether XML Schemas attempts too much is running out. The Schema Working Group have updated all of the relevant documents this week to incorporate changes to the syntax. In his announcement message, Henry Thompson stated that while the Working Group were not quite ready to enter Candidate Recommendation stage

... we are aware that a number of projects using XML Schema are nearing maturity, and we did not think it right to delay releasing information about the syntax changes any longer than necessary, hence this "pre-CR" release of working drafts.

The announcement was followed quickly by the release of an updated version of XSV which incorporated the changes.

It is not clear whether there will be a further Last Call phase or a move directly to Candidate Recommendation status. In either case, time is running out for useful feedback to be provided to the Working Group. Rick Jelliffe noted that now is the right time to comment, urging developers to do so.

Just as with the last round, I encourage XML-DEV-ers to look critically and thoroughly at the upcoming specs: do they meet your needs? are they incomplete in any significant way that might allow vendors to trap you to their products? Can you do simple things simply? Can you use the elements and attributes and values in the ways you think are natural (is it methodology-free enough)? When I extract some data from a schema-ed document, can I do enough with it? What features in particular are definitely "too much"? Are the datatypes complete (should money be a built-in datatype, for example)?

Capturing Best Practice

XML-DEV has been relatively quiet of late but now seems to be stirring to life. Roger Costello, initiator of many schema discussions, published a lengthy note discussing schema "best practice."

I would like to see if we can collectively come up with a set of "best practices" in designing XML Schemas. I realize that the specifics of designing a schema are heavily dependent upon the task at hand. However, I firmly believe that there are guidelines that can be employed in creating a schema, and those guidelines hold true irrespective of the specific task. It is this set of guidelines that I am hoping we can shed some light upon.

In his opening discussion of the topic, Costello highlights six key areas which would benefit from a closer examination:

  1. Element versus Type Reuse
  2. Local versus Global
  3. elementFormDefault -- to qualify or not to qualify
  4. Evolvability or versioning
  5. One namespace versus many namespaces (import versus include)
  6. Capturing semantics of elements and types

This is the kind of community activity which XML-DEV performs very well. Costello has followed the format of previous collaborations like SAX: highlight a set of issues for developers to discuss, then collate the resulting feedback into a useful resource.

So far, discussion has included two of Costello's points, namely, local versus global declarations of elements and types and, whether namespaces used in schemas should be explicit or hidden. The lengthy discussions cannot be properly covered here, but Costello's evolving summary of best practice in schema design represents them well. The interested schema developer will also benefit from reading the relevant threads in the XML-DEV archives.

Costello has captured a number of issues of importance to the schema designer, a timely initiative judging by the increase in use of XML Schemas. DTDs have long been associated with a series of best practices. Schema designers will find this kind of resource invaluable.

Schemas on the Loose

Unfortunately useful progress isn't happening on all fronts. In some areas, the arguments appear to go round in circles. The Deviant has reported before on the debates about URIs, Namespaces and Schemas, and this topic has surfaced again recently. After announcing his Schematron schema for SOAP 1.1, Rick Jelliffe inadvertently rekindled the "does a Namespace URI point to a Schema" debate.

The XML Namespaces specification doesn't mandate that the namespace URI points to a schema; however, it doesn't prohibit it either. This leaves room for choice: you may use a namespace URI that points to an XML Schema, and you can implement your application so that it retrieves its schema from that location. However, unmanaged choice is the bane of interoperability. What happens when you use my namespace or my application, which does not expect the URI to resolve to a schema? Namespaces would end up a confused mess.

As Rick Jelliffe commented, it would be far better if a Namespace URI pointed at

a directory of related resources discoverable by some conventions. Namespace=schema blocks the use of the namespace URI for more systematic and extensible purposes.

David Orchard agreed with Jelliffe.

Eventually, there will be a packaging specification that deals with all the relevant information for a document -- schemas, xslt, xinclude targets, xlinks, xlink targets, gifs...Then there can be a mechanism for retrieving the related documents. But it's very much not a namespace issue.

However it seems like the long fabled packaging specification is never going to happen. In response to Orchard's comments, David Cleary announced that the W3C packaging work has been killed.

At the XML Plenary meeting yesterday, the XML Activity announced that a packaging WG has been killed. There currently are no plans in the W3C to do this work.

This is a disappointment. Packaging has often been cited as the best way to manage the multiplicity of resources which can now be associated with an XML instance. We can only assume that this is not seen as a pressing issue either at the W3C or by its member companies.