XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Extensibility, XML Vocabularies, and XML Schema

October 27, 2004


XML is designed for the creation of languages based upon self-describing markup. The inevitable evolution of these languages -- by adding, deleting, and changing parts -- is called versioning. To make versioning work in practice is one of the most difficult problems in computing, with a long history of failed attempts. Arguably, the Web rose dramatically in popularity because evolution and versioning were built into HTML and HTTP headers. Both languages provide explicit extensibility points and rules for understanding extensions that enabled the decentralized extension and versioning of the languages. 

XML namespaces provide an ideal mechanism for identifying versions of languages, and all XML Schema languages – such as W3C XML Schema – provide for controlled extensibility.  

This article describes techniques to achieve more effective loose coupling between systems by providing a means for backwards- and forwards-compatible changes to occur when systems evolve. These techniques are designed to allow compatible changes with or without schema propagation. A number of questions, design patterns. and rules are introduced to enable versioning in XML vocabularies, making use of XML namespaces and XML Schema constructs.  This includes rules for working with languages that provide an extensible container model, notably SOAP. 

The collective set of guidance is called the “Must Ignore” pattern of extensibility.  Strangely, the “Must Ignore” pattern for HTML tags and HTTP headers that significantly helped the Web’s adoption has not been widely adopted by XML practitioners.; This article aims to rectify that situation within the constraints of current Schema validation environments.  This article is permanently available [1] and is an update of the XML.com article on Versioning XML Languages [17].

Defining Compatibility

FOLDOC [2] provides definitions of backwards and forwards compatibility. This article will reprise those definitions and focus on the exchange of document instances.  The terms consumer and producer are used in relation to document and message oriented exchanges.  Web services readers can translate this article’s use of producer to sender, consumer to receiver, and instance to message. 

Backwards compatibility means that a newer version of a consumer can be rolled out in a way that does not break existing producers.   A producer can send an older version of a message to a consumer that understands the new version and still have the message successfully processed.  Forwards compatibility means that a newer version of a producer can be rolled out in a way that does not break existing consumers. Of course the older consumer will not implement any new behavior, but a producer can send a newer version of an instance and still have the instance successfully processed.

In other words, backwards compatibility means that existing documents can be used by updated consumers, and forwards compatibility means that newer documents can be used by existing consumers.  Another way of thinking of this is related to messages exchanges with producers on the left and consumers on the right.  Backwards compatibility is where the right side (consumer) is updated and forwards compatibility is where the left side (producer) is updated, shown below

Figure 1 – Evolution of Producers and/or Consumers
Figure 1 – Evolution of producers and/or consumers.

Some typical backwards- and forwards-compatible changes:

  • Adding optional components (element(s) and/or attribute(s))
  • Adding optional content to a component’s content model (such as adding an enumeration)

Some typically incompatible changes:

  • Changing the meaning or semantics of existing components
  • Adding required components
  • Removing required components
  • Restricting a components content model (such as changing a choice to a sequence)

The costs associated with introducing changes that are not backward- or forward-compatible are often very high, typically requiring deployed software to be updated to accommodate the newer version, or the deployment, management and related costs of running multiple instances. 

Compatibility is defined for the producer and consumer of an individual instance.  However, most web service specifications provide definitions of inputs and outputs.  In these definitions of compatibility, a web service that updates its output message schema is considered a newer producer.  This simply reverses the producer/consumer terminology of input instances when applying compatibility definitions to output instances.  If a web service updates the schema of the output message, then it is “sending” a newer version of the message, hence it is considered a “producer.”

Language Questions

Having defined compatibility, the choices facing a language designer can be described. 

Can third parties extend(version) the language?  It is rarely desirable to prevent third parties from extending languages on their own but it does happen.  An example may be a tightly constrained security environment where distributed authoring is considered a “bug” rather than a feature.

Can third parties extend the language in a compatible way?  If so, a substitution mechanism, such as simply ignoring unknown extensions, is required for forwards compatibility.  

Can third parties extend (version) the language in an incompatible way?  If so, then incompatible changes can be done as an override of the substitution mechanism (such as a must understand model or extension) or it can even be the default. For example, the WS-Security committee wanted third parties to only provide incompatible extensions.  . Unlike most languages, a security language has unique requirements where the consequences of ignored data can be severe.  They accomplished this by specifying that all extensions are required to be understood and there is no substitution mechanism.

Can the designer extend the language in a compatible way?  As with third-party compatible extensions, a substitution mechanism for the designer’s extensions is required for forwards compatibility.

A question that does not need to be asked is: “Can the designer extend (version) the language in an incompatible way?” They can always do this by using new namespace names, element names, or version numbers.

Is the vocabulary a standalone language or an extension of another vocabulary? A part of this question is whether the language depends on another language?  This determines which, if any, facilities are provided for the language and what must be provided. For example, SOAP headers can use the soap:mustUnderstand attribute and processing model even though the contents of the SOAP headers are independent languages from SOAP.

What Schema language(s)?  This guides the language design as some features, particularly extensibility, must be planned for in V1 and various features may be incompatible across different languages. For example, writing a V2 compatible Schema in XML Schema requires special design (shown later), which is not required in a schema language such as RelaxNG.

Should extensions or versions be expressible in the Schema language?  The ability to write a schema for extensions or versions is directly affected by the schema design and the compatibility desires.

Language Decisions

Upon answering these questions, there are some key decisions that a language developer makes, whether they are consciously made or not.

Schema language design choices or constraints.  If the language can be extended in a compatible way, then a few specific schema design choices must be followed.

Wildcards are used to provide extensibility in XML Schema. If revisions to the Schema are to support substitution, specific schema designs must be used in conjunction with the wildcard. The main choices are: provide wildcards, provide extension elements, or provide delimiter elements.  Extension and delimiter elements are described in the new components in existing or new namespaces section. If extension/delimiter elements are not provided, then a compatible V2 Schema cannot be written.

Substitution mechanism.  Forwards compatibility can only be achieved by providing a substitution mechanism for Version 2 instances or Version 1 extensions to V1 without knowledge of V2. A V1 consumer must be able to transform any instances, such as V1 + extensions, to a V1 instance in order to process the instance.  The “Must Ignore unknown” rule is a simple substitution mechanism.  This rule says that any extensions are “ignored”. Using it, a V1 + extensions document is transform into a V1 document by ignoring the extensions.  Others substitution mechanisms exist, such as the fallback model in XSLT.

Component identification.  The identification of components into language versions or extensions has a variety of general mechanisms related to namespaces.  These are detailed in the Versioning section.

Identification of incompatible extensions.  The identification of versions is covered by language identification, but third parties cannot arbitrarily change versions or change namespaces.  They may need a mechanism to indicate that an extension is an incompatible change.  A couple of mechanisms are a “Must Understand” identifier (such as a flag or list of required namespaces) or requiring that extensions are in substitution groups.

Pages: 1, 2, 3, 4, 5, 6

Next Pagearrow