XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Extensibility, XML Vocabularies, and XML Schema
by David Orchard | Pages: 1, 2, 3, 4, 5, 6

Determinism

This article has spent considerable material describing deterministic content models, and so it is worthy of describing the W3C XML Schema determinism rules in more detail. The reader is reminded that these rules are unique to W3C XML Schema and other XML Schema languages like RelaxNG do not use these rules and so do not suffer from the contortions one is forced through when using W3C XML Schema. XML DTDs and W3C XML Schema have a rule that requires schemas to have deterministic content models. From the XML 1.0 specification:

“ For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the XML processor cannot know which b in the model is being matched without looking ahead to see which element follows the b.”

The use of ##any means there are some schemas that we might like to express, but that aren’t allowed.

  • Wildcards with ##any, where minOccurs does not equal maxOccurs, are not allowed before an element declaration. An instance of the element would be valid for the ##any or the element. ##othercould be used.

  • The element before a wildcard with ##any must have cardinality of maxOccurs equals its minOccurs. If these were different, sayminOccurs=”1” and maxOccurs=”2”, then the optional occurrences could match either the element definition or the ##any. As a result of this rule, the minOccurs must be greater than zero.

  • Derived types that add element definitions after a wildcard with ##any must be avoided. A derived type might add an element definition after the wildcard, then an instance of the added element definition could match either the wildcard or the derived element definition.

11.Be Deterministic rule: Use of wildcards MUST be deterministic. Location of wildcards, namespace of wildcard extensions, minOccurs and maxOccurs values are constrained, and type restriction is controlled.

As shown earlier, a common design pattern is to provide an extensibility point -- not an element -- allowing any namespace at the end of a type. This is typically done with <xs:any namespace=”##any”>.

Determinism makes this unworkable as a complete solution in many cases. Firstly, the extensibility point can only occur after required elements in the original schema, limiting the scope of extensibility in the original schema. Secondly, backwards compatible changes require that the added element is optional, which means a minOccurs=”0”. Determinism prevents us from placing a minOccurs=”0” before an extensibility point of ##any. Thus, when adding an element at an extensibility point, the author can make the element optional and lose the extensibility point, or the author can make the element required and lose backwards compatibility.

Why Is This Hard?

We’ve shown that using XML and W3C XML Schema to achieve loose coupling via compatible changes that fully utilize, yet do not require, new schema definitions is hard. W3C XML Schema documents allowing extensibility and versioning are more cumbersome and at the same time less expressive than one might like. The structural limitations introduced by W3C XML Schema's handling of extensibility are a consequence of W3C XML Schema's design and are not an inherent limitation of schema-based structures.

With respect to W3C XML Schema, it would useful to be able to add elements into arbitrary places, such as before other elements, but the determinism constraint precludes this. A less restrictive type of deterministic model could be employed, such as the “greedy” algorithm defined in the URI specification [5].

This would allow optional elements before wildcards and removing the need for the Extension type we introduced. This still does not allow wildcards before elements, as the wildcard would match the elements instead. Further, this still does not allow wildcards and type extension of the type to coexist. A “priority” wildcard model, where an element that could be matched by a wildcard or an element would match with an element, if possible, would allow wildcards before and after element declarations. However, this model does not address the typical multi-namespace approach of schema design.

A wildcard that only allowed elements not been defined -- effectively other namespaces plus anything not defined in the target namespace -- may be a more useful model. These changes would also allow cleaner mixing of inheritance and wildcards. But that still means that the author has to sprinkle wildcards throughout their types.

A type-level any element combined with the aforementioned wildcard changes is needed. One potential solution is that the sequence declaration could have an attribute specifying that extensions be allowed in any place, then a commensurate attribute, specifying namespaces, elements, and validation rules. Finally, an extension mechanism that enabled replacement of the wildcard with an updated content model would enable modularity of the compatible and incompatible schemas.

The problem with even this last approach is that with a specific schema it is sometimes necessary to apply the same schema in a strict or relaxed fashion in different parts of a system. A long-standing rule for the Internet is the Robustness Principle, articulated in the Internet Protocol [4], as “In general, an implementation must be conservative in its sending behavior, and liberal in its receiving behavior”.

In schema validation terms, a producer can apply a schema in a strict way while a consumer can apply a schema in a relaxed way. In this case, the degree of strictness is not an attribute of the schema, but of how it is used. A solution that appears to solve these problems is defining a form of schema validation that permits an open content model that is used when schemas are versioned.

We call this model validation "by projection," and it works by ignoring, rather than rejecting, component names that appear in a message that are not explicitly defined by the schema. This is possible using partial validation in XML Schema. A two-pass schema validation model can do this, where the first pass finds the “extra” content, this is then removed from the components to validate, and a second pass validation is done.

A final comment on XML Schema extensibility is that there is still the unmet need for the ability to define schemas that validate known extensions while retaining extensibility. An author will want to create a schema based upon an extensible schema but mix in other known schemas in particular wildcards while retaining the wildcard extensibility. We encounter this difficulty in areas like describing SOAP header blocks. The topic of composing schemas from many schemas is difficult yet pressing.

Leaving the topic of wildcard extensibility, the use of type extension over the Web might be more palatable if the instance document could express a base type if the consumer does not understand the extension type, as in xsi:basetype=””. The consumer could then fallback to using the basetype if it did not understand the base type’s extension.

Another area for architectural improvement is that XML -- or even XML Schema -- could have provided a Must Understand model. As things stand, each vocabulary that provides a Must Understand model reinvents the mU wheel. XML could have provided an xml:mustUnderstand attribute and model that each language could use. Tim Berners-Lee articulated the need for this in XML in his design note on mandatory extensions in Feb 2000[19], but neither XML 1.0 nor 1.1 included this model.

Finally, there is ambiguity in compliance testing for W3C XML Schema implementations. The W3C XML Schema test collection [15] does not test some of the more common cases that have been precluded here. For example, the wildcard tests cover a different style, which is xs:any inside a complex type. These do not cover some of the non-deterministic cases, typically achieved by combining minOccurs/maxOccurs variations with ##any, or combining inheritance with ##any. Potentially as a result, some implementations do not correctly test for non-determinism, which may yield non-interoperable documents.

One common concern is about implementation support for these features and combinations. These samples have been tried in many different schema parsers and toolkits, such as XML Beans, SQC, JAX-RPC. While it’s impossible to know whether all implementations support these rules, there seems to be good support for what was tested. The author is certainly interested in hearing about toolkits that don’t support these rules.

Other Technologies

The W3C XML Schema Working has heard and taken to heart many of these concerns. They have plans to remedy some of these issues in XML Schema 1.1 [21]. They currently are looking at a “weak wildcard” model, which solves some but not all of the problems. There is no public Working Draft of a Schema 1.1 with improved extensibility or versioning at the time of writing this article.

A simple analysis of doing compatible extensibility and versioning using RDF and OWL is available [21]. In general, RDF and OWL offer superior mechanisms for extensibility and versioning. RDF and OWL explicitly allow extension components to be added to components. And further, the RDF and OWL model builds in the notion of “Must Ignore Unknowns” as an RDF/OWL processor will absorb the extra components but do nothing with them. An extension author can require that consumers understand the extension by changing the type using a type extension mechanism.

RelaxNG is another schema language. It explicitly allows extension components to be added to other components as it does not have the non-determinism constraint.

Conclusion

This article started many months ago. At roughly the same time as an a previous version was published, the W3C TAG decided that the topic of versioning and extensibility was important enough to web architecture to work on a finding [22] and include material into the web architecture document [23]. While this article provided a starting point for the TAG material, this material will cover a broader scope and progress in a more interactive and iterative fashion than any article can. Readers can follow an ongoing version of this article and the TAG material for an ongoing treatment of the area of extensibility and versioning.

This article describes a number of questions, decisions and rules for using XML, W3C XML Schema, and XML namespaces in language construction and extension. The main goal of the rules is to allow language designers to know their options for language design, and ideally make backwards- and forwards-compatible changes to their languages to achieve loose coupling between systems.

References

  1. Extending and Versioning XML Languages, by Dave Orchard, ongoing location, http://www.pacificspirit.com/Authoring/Compatibility/ExtendingAndVersioningXMLLanguages.html
  2. Free Online Dictionary of Computing, http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?forward+compatible
  3. Flexible XML Processing Profile, http://www.upnp.org/download/draft-goland-fxpp-01.txt
  4. IETF RFC 791, http://www.ietf.org/rfc/rfc791.txt
  5. IETF RFC 2396, http://www.ietf.org/rfc/rfc2396.txt
  6. IETF RFC 2518, http://www.ietf.org/rfc/rfc2518.txt
  7. IETF RFC 2616, http://www.ietf.org/rfc/rfc2616.txt
  8. SOAP 1.1, http://www.w3.org/TR/SOAP/
  9. WSDL 1.1, http://www.w3.org/TR/wsdl.html
  10. WS-Policy Framework, http://ifr.sap.com/ws-policy/ws-policy.pdf
  11. W3C Note, Web Architecture: Extensible Languages, http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210
  12. W3C XML 1.0, http://www.w3.org/TR/REC-xml
  13. W3C XML Namespaces, http://www.w3.org/TR/REC-xml-names/
  14. W3C XML Schema Part 1, http://www.w3.org/TR/xmlschema-1/
  15. W3C XML Schema Working Group’s Test collection for Any, http://www.w3.org/XML/2001/05/xmlschema-test-collection/result-ms-wildcards.htm
  16. XML.com W3C XML Schema design Patterns by Dare Obasanjo, http://www.xml.com/pub/a/2002/07/03/schema_design.html
  17. XML.Com Versioning XML by Dave Orchard, http://www.xml.com/pub/a/2003/12/03/versioning.html
  18. MSDN Designing Extensible, Versionable XML Formats by Dare Obasanjo, http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnexxml/html/xml07212004.asp
  19. Tim Berners-Lee’s writings on evolution, extensibility and Must Understand:

  20. http://lists.w3.org/Archives/Public/w3c-dist-auth/1997AprJun/0190.html
  21. Dave Orchard’s writings on extensibility and Versioning:

  22. W3C TAG Finding on extensibility and versioning, http://www.w3.org/2001/tag/doc/versioning
  23. W3C TAG Web Architecture document section on extensibility and versioning, http://www.w3.org/TR/webarch/ - ext-version

Acknowledgements

The author thanks the many reviewers that have contributed to the article, particularly David Bau, William Cox, Edd Dumbill, Chris Ferris, Yaron Goland, Hal Lockhart, Mark Nottingham, Jeffrey Schlimmer, Cliff Schmidt, and Norman Walsh.



1 to 1 of 1
  1. ISO Standard for "projection" languages currently in late draft
    2004-11-05 00:36:47 rjelliffe
1 to 1 of 1