Versioning XML Vocabularies
by David Orchard
|
Pages: 1, 2
Versioning
When a new version of a language is required, and it is backwards compatible with the older language, then the author must make a decision about the namespace name for names in the new language. There are two choices: create a new namespace name or reuse the existing namespace name. We argue that reusing is more efficient, and we will explore the problems with option #1 in the "new namespace" section. The reusing namespace rule is
8. Re-use namespace names Rule: If a backwards compatible change can be made to a specification, then the old namespace name SHOULD be used in conjunction with XML's extensibility model.
An important conclusion is that a new namespace name is only required when an incompatible change is made.
9. New namespaces to break Rule: A new namespace name is used when backwards compatibility is not permitted, that is software MUST break if it does not understand the new language components.
Non-backwards compatible changes typically occur in two ways: a required information item is added or the semantics of an existing information item are changed.
The reuse namespace names rule requires the previous Must Ignore and Any Namespace rules be followed. If these rules are not followed, then a language designer is precluded from making compatible changes and reusing the namespace name.
We've articulated that reusing namespace names for compatible
extensions are good practice. The counter position is that the
namespace owner could use a new namespace for the compatible
changes by providing extensibility points allowing other namespaces
-- <xs:any namespace="##other">. This technique
suffers from the problem that an extension in a different namespace
means that the combined schema cannot be fully validated.
Specifically, there is no way to create a new schema that constrains
the wildcard. For example, imagine that ns1 contains foo and bar. It
is not possible to take the SOAP schema -- an example of a schema with
a wildcard -- and require that ns1:foo element must be a child of the
header element and ns1:bar must not be a child of the header element
using just W3C XML Schema constructs. Indeed, the need for this
functionality spawned some of the WSDL functionality. The new
namespace name approach results in specifications and namespaces that
are inappropriately factored, as related constructs will be in
separate namespaces. Further, the reuse of the same namespace has
better tooling support. Many applications use a single schema to
create the equivalent programming constructs. These tools often work
best with single namespace support for the "generated" constructs. The
reuse of the namespace name allows at least the namespace author to
make changes to the namespace and perform validation of the
extensions.
Default processing model over-ride
Given adoption of the Must Ignore rule, it is often the case that the creator of an extension wants to require that the receiver understand the extension, overriding the Must Ignore rule.
10. Provide mustUnderstand Rule: Container languages SHOULD provide a "mustUnderstand" model for dealing with optionality of extensions that override a default Must Ignore Rule.
This rule and the Must Ignore rule work together to provide a
stable and flexible processing model for extensions. Arguably the
simplest and most flexible override technique is a
mustUnderstand flag that indicates whether the item
must be understood. The SOAP [7], WSDL
[8], and WS-Policy [10] attributes and values for specifying understand
are respectively: soap:mustUnderstand="1",
wsdl:required="1",
wsp:Usage="wsp:Required". SOAP is probably the most
common case of a container that provides a
mustUnderstand model. The default value is
0, which is effectively the Must Ignore rule.
A mustUnderstand flag allows the sender to insert extensions into the container and use the mustUnderstand attribute to override the must Ignore rule. This allows senders to extend messages without changing the extension element's parent's namespace, retaining backwards compatibility. Obviously the receiver must be extended to handle new extensions, but there is now a loose coupling between the language's processing model and the extension's processing model.
There are other techniques possible, such as providing an element that indicates which extension namespaces must be understood.
In some cases a language does not provide a mustUnderstand mechanism. In the absence of a mustUnderstand model, there is no way to force receivers to reject a message if they don't understand the extension namespace.
Determinism
XML DTDs and W3C XML Schema have a rule that requires schemas to have deterministic content models. From the XML 1.0 specification,
For example, the content model((b, c) | (b, d))is non-deterministic, because given an initialbthe XML processor cannot know whichbin the model is being matched without looking ahead to see which element follows theb.
The use of ##any means there are some schemas that
we might like to express, but that aren't allowed.
- Wildcards with
##any,whereminOccursdoes not equalmaxOccurs, are not allowed before an element declaration. An instance of the element would be valid for the##anyor the element.##othercould be used. - The element before a wildcard with ##any must have cardinality
of
maxOccursequals itsminOccurs. If these were different, sayminOccurs="1"andmaxOccurs="2", then the optional occurrences could match either the element definition or the ##any. As a result of this rule, theminOccursmust be greater than zero. - Derived types that add element definitions after a wildcard
with ##
anymust be avoided. A derived type might add an element definition after the wildcard, then an instance of the added element definition could match either the wildcard or the derived element definition.
11. Be
Deterministic rule: Use of wildcards MUST be
deterministic. Location of wildcards, namespace of wildcard
extensions, minOccurs and maxOccurs values
are constrained, and type restriction is controlled.
As shown earlier, a common design pattern is to provide an
extensibility point -- not an element -- allowing any namespace at the
end of a type. This is typically done with <xs:any
namespace="##any">.
Determinism makes this unworkable as a complete solution in many
cases. Firstly, the extensibility point can only occur after required
elements in the original schema, limiting the scope of extensibility
in the original schema. Secondly, backwards compatible changes require
that the added element is optional, which means
a minOccurs="0". Determinism prevents us from placing
a minOccurs="0" before an extensibility point of
##any. Thus, when adding an element at an extensibility point, the
author can make the element optional and lose the extensibility point,
or the author can make the element required and lose backwards
compatibility.
Why is this hard?
We've shown that using XML and W3C XML Schema to achieve loose coupling via compatible changes that fully utilize yet do not require new schema definitions is hard. Following these extensibility rules leads to W3C XML Schema documents that are more cumbersome and at the same time less expressive than one might like. The structural limitations introduced by W3C XML Schema's handling of extensibility are a consequence of W3C XML Schema's design and are not an inherent limitation of schema-based structures.
With respect to W3C XML Schema, it would useful to be able to add
elements into arbitrary places, such as before other elements, but the
determinism constraint constrains this. A less restrictive type of
deterministic model could be employed, such as the "greedy" algorithm
defined in the URI specification [4]. This
would allow optional elements before wildcards and removing the need
for the Extension type we introduced. This still does not allow
wildcards before elements, as the wildcard would match the elements
instead. Further, this still does not allow wildcards and type
extension of the type to coexist. A "priority" wildcard model, where
an element that could be matched by a wildcard or an element would
match with an element if possible would allow wildcards before and
after element declarations. Additionally, a wildcard that only allowed
elements that had not been defined -- effectively other namespaces
plus anything not defined in the target namespace -- is another useful
model. These changes would also allow cleaner mixing of inheritance
and wildcards. But that still means that the author has to sprinkle
wildcards throughout their types. A type-level any
element combined with the aforementioned wildcard changes is
needed. One potential solution is that the sequence declaration could
have an attribute specifying that extensions be allowed in any place,
then a commensurate attributes specifying namespaces, elements, and
validation rules.
The problem with this last approach is that with a specific schema it is sometimes necessary to apply the same schema in a strict or relaxed fashion in different parts of a system. A long-standing rule for the Internet is the Robustness Principle, articulated in the Internet Protocol [3], as "In general, an implementation must be conservative in its sending behavior, and liberal in its receiving behavior". In schema validation terms, a sender can apply a schema in a strict way while a receiver can apply a schema in a relaxed way. In this case, the degree of strictness is not an attribute of the schema, but of how it is used. A solution that appears to solve these problems is to define a form of schema validation that permits an open content model that is used when schemas are versioned. We call this model validation "by projection", and it works by ignoring, rather than rejecting, component names that appear in a message that are not explicitly defined by the schema. We plan to explore this relaxed validation model in the future.
A final comment on W3C XML Schema extensibility is that there is still the unmet need to define schemas that validate known extensions while retaining extensibility. An author will want to create a schema based upon an extensible schema but mix in other known schemas in particular wildcards while retaining the wildcard extensibility. We encounter this difficulty in areas like describing SOAP header blocks. The topic of composing schemas from many schemas is difficult yet pressing.
Leaving the topic of wildcard extensibility, the use of type
extension over the Web might be more palatable if the instance
document could express a base type if the receiver does not understand
the extension type, as in xsi:basetype="". The receiver
could then fallback to using the basetype if it did not understand the
base type's extension.
Another area for architectural improvement is that XML -- or
even W3C XML Schema -- could have provided a
mustUnderstand model. As things stand, each vocabulary
that provides a mustUnderstand model reinvents the mU
wheel. XML could have provided an xml:mustUnderstand
attribute and model that each language could use. Tim Berners-Lee
articulated the need for this in XML in his design note on
mandatory extensions in Feb 2000[18],
but neither XML 1.0 nor 1.1 included this model.
Finally, there is ambiguity in compliance testing for W3C XML
Schema implementations. The W3C XML Schema test collection [16]
does not test some of the more common cases that have been precluded
here. For example, the wildcard tests cover a different style, which
is
xs:any inside a complex type. These do not cover some
of the non-deterministic cases, typically achieved by combining
minOccurs/maxOccurs variations with
##any or combining inheritance with
##any. Thus, some implementations do not correctly test
for non-determinism, which may yield non-interoperable documents.
One common concern is about implementation support for these features and combinations. These samples have been tried in many different schema parsers and toolkits, such as XML Beans, SQC, and JAX-RPC. While it's impossible to know whether all implementations support these rules, there seems to be good support for what was tested. The author is certainly interested in hearing about toolkits that don't support these rules.
Conclusion
The W3C TAG decided that the topic of versioning and extensibility is important enough to web architecture to work on a finding [20] and to include material into the Web Architecture document [21]. While this article provided a starting point for the TAG material, that material will cover a broader scope and progress in a more interactive and iterative fashion than an article can. Readers can follow the TAG material for an ongoing treatment of the area of extensibility and versioning.
This article describes a number of rules for using XML, W3C XML Schema, and XML Namespaces in language construction and extension. The main goal of the set of rules is to allow language designers to make backwards- and forwards-compatible changes to their languages in order to achieve loose coupling between systems.
To a certain degree, the technique described herein is a
combination of the ##any and ##other
designs with well-known rules to produce a design that achieves the
goals of compatible extensibility and versioning with validation
using W3C XML Schema. The namespace name owner can add backwards-
and forwards-compatible changes into the extensibility element
while retaining the ability to validate all components, and other
authors can add their changes at the ##other wildcard
location.
References
- Free Online Dictionary of Computing
- Flexible XML Processing Profile
- IETF RFC 791
- IETF RFC 2396
- IETF RFC 2518
- IETF RFC 2616
- SOAP 1.1
- WSDL 1.1
- WS-Callback
- WS-Policy Framework
- Xfront's Schema Best Practices
- W3C Note, Web Architecture: Extensible Languages
- W3C XML 1.0
- W3C XML Namespaces
- W3C XML Schema Part 1
- W3C XML Schema Working Group's Test collection for Any
- XML.com: W3C XML Schema design Patterns, by Dare Obasanjo
- Tim Berners-Lee's writings on evolution, extensibility and must Understand:
- http://lists.w3.org/Archives/Public/w3c-dist-auth/1997AprJun/0190.html
- W3C TAG Finding on extensibility and versioning
- W3C TAG Web Architecture document section on extensibility and versioning
Acknowledgments
The author thanks the many reviewers that have contributed to the article, particularly David Bau, William Cox, Edd Dumbill, Chris Ferris, Yaron Goland, Hal Lockhart, Mark Nottingham, Jeffrey Schlimmer, Cliff Schmidt, and Norman Walsh. This article borrows, with permission of the authors, examples and some text from WS-Callback [9].
- rule 8 implications questioned
2003-12-12 05:41:25 Paul K - Good stuff
2003-12-09 04:42:34 Daniel Zambonini