Designing Extensible, Versionable XML Formats
by Dare Obasanjo
|
Pages: 1, 2, 3
Using XML Schema to Design a Versionable XML Format
Although W3C XML Schema (WXS) has a number of features for designing extensible XML vocabularies, there isn't a similar plethora of features for designing versionable XML vocabularies. There are, however, general approaches to providing a versioning policy for an XML vocabulary that are compatible with WXS. The following approaches provide mechanisms for describing XML formats using WXS in a way that enables evolution in a backward-compatible way.
1. New constructs in a new namespace: The most straightforward versioning mechanism is to specify that additions to the format should be in a different namespace from the core components of the format. To make this backward compatible, the XML format should have an extensibility model with default Must Ignore rules for items outside the namespaces the consumer understands, in combination with mustUnderstand constructs.
The following examples show version 1 of XML schemas that describe a collection of books and an XML document that conforms to the schema.
BOOKS-CORE.XSD
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.com/books-core">
<xs:attribute name="mustUnderstand" type="xs:boolean" />
</xs:schema>
BOOKS-V1.XSD
<xs:schema elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.com/books/v1"
xmlns:b1="http://www.example.com/books/v1">
<xs:element name="books">
<xs:complexType>
<xs:sequence>
<xs:element name="book" type="b1:bookType"
maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="version" type="xs:string" />
</xs:complexType>
</xs:element>
<xs:complexType name="bookType">
<xs:sequence>
<xs:element name="title" type="xs:string" />
<xs:element name="author" type="xs:string" />
<xs:any namespace="##other" minOccurs="0"
maxOccurs="unbounded"
processContents="lax" />
</xs:sequence>
<xs:attribute name="publisher" type="xs:string" />
</xs:complexType>
</xs:schema>
BOOKS.XML
<books version="1.0"
xmlns="http://www.example.com/books/v1">
<book publisher="IDG books">
<title>XML Bible</title>
<author>Elliotte Rusty Harold</author>
</book>
<book publisher="Addison-Wesley">
<title>The Mythical Man Month</title>
<author>Frederick Brooks</author>
</book>
<book publisher="WROX">
<title>Professional XSLT 2nd Edition</title>
<author>Michael Kay</author>
<price xmlns="http://www.example.com/book/extensions">
24.99
</price>
</book>
</books>
The schema for the http://www.example.com/books/v1 namespace
describes the books element, which can contain one or
more book elements that have an author
and title element, as well as a publisher
attribute. The content model of the book element
allows for zero or more elements from any namespace besides the
target namespace of the schema to appear after the
author and title elements. The schema
for the http://www.example.com/books-core namespace contains a
mustUnderstand attribute that must be added on
extension elements or elements from a future version of the format.
In the next version of the format, it is decided that an
isbn element should be added to the content model of
the book element. Since all consumers and producers
of the Example.com XML Book Format won't upgrade at the same time
there will be times when someone using version 2 of the format
will not understand the isbn element. Since the
importance of the isbn element is dependent on the
application, it is decided that the isbn element can
appear with a mustUnderstand, indicating whether the application
consuming the format must know how to process ISBNs. This
is the schema for version 2 of the format along with a sample
document:
BOOKS-V1.XSD
<xs:schema elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.com/books/v1"
xmlns:b1="http://www.example.com/books/v1"
xmlns:b2="http://www.example.com/books/v2">
<xs:import namespace="http://www.example.com/books/v2"
schemaLocation="books-v2.xsd" />
<xs:element name="books">
<xs:complexType>
<xs:sequence>
<xs:element name="book" type="b1:bookType"
maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="version" type="xs:string" />
</xs:complexType>
</xs:element>
<xs:complexType name="bookType">
<xs:sequence>
<xs:element name="title" type="xs:string" />
<xs:element name="author" type="xs:string" />
<xs:element ref="b2:isbn" />
<xs:any namespace="##other" minOccurs="0"
maxOccurs="unbounded"
processContents="lax" />
</xs:sequence>
<xs:attribute name="publisher" type="xs:string" />
</xs:complexType>
</xs:schema>
BOOKS-V2.XSD
<xs:schema elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.com/books/v2"
xmlns:core="http://www.example.com/books-core">
<xs:import namespace="http://www.example.com/books-core"
schemaLocation="books-core.xsd" />
<xs:element name="isbn">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute ref="core:mustUnderstand"
default="false"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:schema>
BOOKS.XML
<books version="2.0" xmlns="http://www.example.com/books/v1"
xmlns:p="http://www.example.com/book/extensions"
xmlns:v2="http://www.example.com/books/v2"
xmlns:bc="http://www.example.com/books-core">
<book publisher="HCI">
<title>A Child Called It</title>
<author>Dave Pelzer</author>
<v2:isbn bc:mustUnderstand="true">
1-55874-766-9
</v2:isbn>
<p:price>9.95</p:price>
</book>
</books>
The primary drawback of this approach is that core components of the format are not in the same namespace. This makes it tricky for applications or human readers of the format to differentiate between extensions and core aspects of the format that show up in a later version.
A secondary drawback is that although this approach is backward
compatible (v2 documents can be consumed by v1 clients), it is not
forward compatible. The v2 schema states that an
isbn is mandatory, which is not the case in v1. This means that a v1 document will be rejected by a v2 client. Switching the
isbn element to being optional doesn't work because
it makes the schema non-deterministic. It is non-deterministic
because when an isbn element is seen, the validator
cannot tell whether the sequence is over. The element may
be validated as the optional isbn element that
follows an author, or against the wildcard, which
allows any element in a namespace other than the target namespace
to appear.
Both of these drawbacks are tackled by the approach described next.
2. Using version extensibility points: Conceptually, a data format is made versionable by providing a well-defined extensibility point where additions to the format are expected to appear. This functionality is provided in WXS using wildcards. However, in practice simply placing a wildcard at a particular point in a content model often leads to non-deterministic content models. The following example shows a non-deterministic schema that intuitively seems like it should work.
<xs:schema elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.com/incorrect"
<!-- THIS TYPE IS NON-DETERMINISTIC -->
<xs:complexType name="bookType">
<xs:sequence>
<xs:element name="title" type="xs:string" />
<xs:element name="author" type="xs:string" />
<xs:element name="isbn" type="xs:string" minOccurs="0" />
<xs:any namespace="##targetNamespace ##other" minOccurs="0"
maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="publisher" type="xs:string" />
</xs:complexType>
</xs:schema>
As mentioned earlier, the problem with the above schema is that
when an isbn element is seen the validator cannot
tell whether the sequence is over. This is because the element may be
validated as the optional isbn element that follows
an author, or against the wildcard, which allows any
element in a namespace other than the target namespace to appear.
This limitation is due to the
Unique Particle Attribution Constraint of XML schema.
To make usage of wildcards deterministic in such situations, you can provide delimiters or sentry elements around the wildcard that helps the validator determine when the elements to validate against the wildcard begin and when they end.
The following examples show version 1 of XML schemas that describe a collection of books and an XML document that conforms to the schema.
BOOKS-CORE.XSD
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.com/books-core">
<xs:element name="delimiter">
<xs:complexType />
</xs:element>
<xs:element name="end">
<xs:complexType />
</xs:element>
<xs:attribute name="mustUnderstand" type="xs:boolean" />
</xs:schema>
BOOKS.XSD
<xs:schema elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.com/books"
xmlns:b="http://www.example.com/books"
xmlns:bc="http://www.example.com/books-core">
<xs:import namespace="http://www.example.com/books-core"
schemaLocation="books-core.xsd" />
<xs:element name="books">
<xs:complexType>
<xs:sequence>
<xs:element name="book" type="b:bookType"
maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="version" type="xs:string" />
</xs:complexType>
</xs:element>
<xs:complexType name="bookType">
<xs:sequence>
<xs:element name="title" type="xs:string" />
<xs:element name="author" type="xs:string" />
<xs:element name="isbn" type="xs:string"
minOccurs="0" />
<xs:sequence minOccurs="0" maxOccurs="1">
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element ref="bc:delimiter" />
<xs:any namespace="##targetNamespace ##local"
minOccurs="0" maxOccurs= "unbounded"/>
</xs:sequence>
<xs:element ref="bc:end" />
</xs:sequence>
<xs:group ref="b:extensionGroup" minOccurs="0" />
</xs:sequence>
<xs:attribute name="publisher" type="xs:string" />
</xs:complexType>
<xs:group name="extensionGroup">
<xs:sequence>
<xs:element name="extensions">
<xs:complexType>
<xs:sequence>
<xs:any namespace="##other" minOccurs="0"
maxOccurs="unbounded"
processContents="lax" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:group>
</xs:schema>
BOOKS.XML
<books version="1.0" xmlns="http://www.example.com/books">
<book publisher="IDG books">
<title>XML Bible</title>
<author>Elliotte Rusty Harold</author>
</book>
<book publisher="Addison-Wesley">
<title>The Mythical Man Month</title>
<author>Frederick Brooks</author>
<isbn>0-373-70708-8</isbn>
</book>
<book publisher="WROX">
<title>Professional XSLT 2nd Edition</title>
<author>Michael Kay</author>
<extensions>
<price xmlns="http://www.example.com/book/extensions">
24.99
</price>
</extensions>
</book>
</books>
The schema for the http://www.example.com/books/ namespace
describes the books element, which can contain one or
more book elements that subsequently have required
author and title elements, an optional
isbn element, plus a publisher
attribute. Each book element also has an
extensibility point within which a delimiter element
followed by zero or more elements from the target namespace can
occur multiple times.
The end of the extensibility point is
bounded by an end element. The content model of the
book element also allows for zero or more elements
from any namespace besides the target namespace of the schema to
appear after the end element. The schema for the
http://www.example.com/books-core namespace contains a
mustUnderstand attribute, which must be added on
extension elements or elements from a future version of the format. The
delimiter and end elements are also
defined in this schema.
In the next version of the format, it is decided to add an
additional edition-number element to the content model of
the book element. Below is the schema for version 2
of the format, along with a sample document.
BOOKS.XSD
<xs:schema elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.com/books"
xmlns:b="http://www.example.com/books"
xmlns:bc="http://www.example.com/books-core">
<xs:import namespace="http://www.example.com/books-core"
schemaLocation="books-core.xsd" />
<xs:element name="books">
<xs:complexType>
<xs:sequence>
<xs:element name="book" type="b:bookType"
maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="version" type="xs:string" />
</xs:complexType>
</xs:element>
<xs:complexType name="bookType">
<xs:sequence>
<xs:element name="title" type="xs:string" />
<xs:element name="author" type="xs:string" />
<xs:element name="isbn" type="xs:string"
minOccurs="0" />
<xs:sequence minOccurs="0" maxOccurs="1">
<xs:element ref="bc:delimiter" />
<xs:element name="edition-number"
type="xs:positiveInteger"
minOccurs="0" />
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element ref="bc:delimiter" />
<xs:any namespace="##targetNamespace ##local"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:element ref="bc:end" />
</xs:sequence>
<xs:any namespace="##other" minOccurs="0"
maxOccurs="unbounded"
processContents="lax" />
</xs:sequence>
<xs:attribute name="publisher" type="xs:string" />
</xs:complexType>
</xs:schema>
BOOKS.XML
<books version="2.0" xmlns="http://www.example.com/books"
xmlns:p="http://www.example.com/book/extensions"
xmlns:bc="http://www.example.com/book-core">
<book publisher="HCI">
<title>A Child Called It</title>
<author>Dave Pelzer</author>
<isbn>1-55874-766-9</isbn>
<bc:delimiter />
<edition-number>1<edition-number>
<bc:end />
<extensions>
<p:price>9.95</p:price>
</extensions>
</book>
</books>
Unlike the New constructs in a new namespace approach,
this approach keeps all the core components of the format in a
single namespace and is forward compatible as well as backward
compatible. It should be noted that forward compatibility is
dependent on not adding any new required constructs in future versions. Another benefit of this
approach is that it obviates the need for having an explicit
mustUnderstand construct since you can simply specify
that if a consumer encounters any unknown element from the target
namespace of the format then it should
result in a fatal error.
The primary drawback of using version extensibility points is that it makes both schemas and XML instances more verbose, and therefore potentially more confusing.