XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Designing Extensible, Versionable XML Formats
by Dare Obasanjo | Pages: 1, 2, 3

Using XML Schema to Design a Versionable XML Format

Although W3C XML Schema (WXS) has a number of features for designing extensible XML vocabularies, there isn't a similar plethora of features for designing versionable XML vocabularies. There are, however, general approaches to providing a versioning policy for an XML vocabulary that are compatible with WXS. The following approaches provide mechanisms for describing XML formats using WXS in a way that enables evolution in a backward-compatible way.

1. New constructs in a new namespace: The most straightforward versioning mechanism is to specify that additions to the format should be in a different namespace from the core components of the format. To make this backward compatible, the XML format should have an extensibility model with default Must Ignore rules for items outside the namespaces the consumer understands, in combination with mustUnderstand constructs.

The following examples show version 1 of XML schemas that describe a collection of books and an XML document that conforms to the schema.


BOOKS-CORE.XSD
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
        targetNamespace="http://www.example.com/books-core">
  <xs:attribute name="mustUnderstand" type="xs:boolean" />
</xs:schema>

BOOKS-V1.XSD
<xs:schema elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.example.com/books/v1" 
           xmlns:b1="http://www.example.com/books/v1"> 
  <xs:element name="books">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" type="b1:bookType" 
                    maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="version" type="xs:string" />
    </xs:complexType>
  </xs:element>
  <xs:complexType name="bookType">
    <xs:sequence>
      <xs:element name="title" type="xs:string" />
       <xs:element name="author" type="xs:string" />
       <xs:any namespace="##other" minOccurs="0"
               maxOccurs="unbounded"
               processContents="lax" />
    </xs:sequence>
    <xs:attribute name="publisher" type="xs:string" />
  </xs:complexType>
</xs:schema>

BOOKS.XML
<books version="1.0" 
       xmlns="http://www.example.com/books/v1">
 <book publisher="IDG books">
   <title>XML Bible</title>
   <author>Elliotte Rusty Harold</author>
 </book>
 <book publisher="Addison-Wesley">
   <title>The Mythical Man Month</title>
   <author>Frederick Brooks</author>
 </book>
 <book publisher="WROX">
   <title>Professional XSLT 2nd Edition</title>
   <author>Michael Kay</author>
   <price xmlns="http://www.example.com/book/extensions">
     24.99
   </price>
 </book>
</books>
 

The schema for the http://www.example.com/books/v1 namespace describes the books element, which can contain one or more book elements that have an author and title element, as well as a publisher attribute. The content model of the book element allows for zero or more elements from any namespace besides the target namespace of the schema to appear after the author and title elements. The schema for the http://www.example.com/books-core namespace contains a mustUnderstand attribute that must be added on extension elements or elements from a future version of the format.

In the next version of the format, it is decided that an isbn element should be added to the content model of the book element. Since all consumers and producers of the Example.com XML Book Format won't upgrade at the same time there will be times when someone using version 2 of the format will not understand the isbn element. Since the importance of the isbn element is dependent on the application, it is decided that the isbn element can appear with a mustUnderstand, indicating whether the application consuming the format must know how to process ISBNs. This is the schema for version 2 of the format along with a sample document:


BOOKS-V1.XSD
<xs:schema elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.example.com/books/v1"
           xmlns:b1="http://www.example.com/books/v1"
           xmlns:b2="http://www.example.com/books/v2"> 
  <xs:import namespace="http://www.example.com/books/v2"
             schemaLocation="books-v2.xsd" />
  <xs:element name="books">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" type="b1:bookType"
                  maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="version" type="xs:string" />
    </xs:complexType>
  </xs:element>
  <xs:complexType name="bookType">
    <xs:sequence>
      <xs:element name="title" type="xs:string" />
      <xs:element name="author" type="xs:string" />
      <xs:element ref="b2:isbn"  /> 
      <xs:any namespace="##other" minOccurs="0"
              maxOccurs="unbounded"
              processContents="lax" />
    </xs:sequence>
    <xs:attribute name="publisher" type="xs:string" />
  </xs:complexType>
</xs:schema>

BOOKS-V2.XSD
<xs:schema elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.example.com/books/v2" 
           xmlns:core="http://www.example.com/books-core">
  <xs:import namespace="http://www.example.com/books-core"
             schemaLocation="books-core.xsd" />
  <xs:element name="isbn">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:string">
          <xs:attribute ref="core:mustUnderstand"
                        default="false"/>
        </xs:extension>
      </xs:simpleContent>
     </xs:complexType>
  </xs:element>
</xs:schema>

BOOKS.XML
<books version="2.0" xmlns="http://www.example.com/books/v1"
       xmlns:p="http://www.example.com/book/extensions"
       xmlns:v2="http://www.example.com/books/v2"
       xmlns:bc="http://www.example.com/books-core">
  <book publisher="HCI">
    <title>A Child Called It</title>
    <author>Dave Pelzer</author>
    <v2:isbn bc:mustUnderstand="true">
      1-55874-766-9
    </v2:isbn>
    <p:price>9.95</p:price>
  </book>
</books>

The primary drawback of this approach is that core components of the format are not in the same namespace. This makes it tricky for applications or human readers of the format to differentiate between extensions and core aspects of the format that show up in a later version.

A secondary drawback is that although this approach is backward compatible (v2 documents can be consumed by v1 clients), it is not forward compatible. The v2 schema states that an isbn is mandatory, which is not the case in v1. This means that a v1 document will be rejected by a v2 client. Switching the isbn element to being optional doesn't work because it makes the schema non-deterministic. It is non-deterministic because when an isbn element is seen, the validator cannot tell whether the sequence is over. The element may be validated as the optional isbn element that follows an author, or against the wildcard, which allows any element in a namespace other than the target namespace to appear.

Both of these drawbacks are tackled by the approach described next.

2. Using version extensibility points: Conceptually, a data format is made versionable by providing a well-defined extensibility point where additions to the format are expected to appear. This functionality is provided in WXS using wildcards. However, in practice simply placing a wildcard at a particular point in a content model often leads to non-deterministic content models. The following example shows a non-deterministic schema that intuitively seems like it should work.

<xs:schema elementFormDefault="qualified"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://www.example.com/incorrect" 
<!-- THIS TYPE IS NON-DETERMINISTIC --> 
  <xs:complexType name="bookType">
    <xs:sequence>
      <xs:element name="title" type="xs:string" />
      <xs:element name="author" type="xs:string" />
      <xs:element name="isbn" type="xs:string" minOccurs="0" />
      <xs:any namespace="##targetNamespace ##other" minOccurs="0"
              maxOccurs="unbounded" />
    </xs:sequence>
    <xs:attribute name="publisher" type="xs:string" />
  </xs:complexType>
</xs:schema>

As mentioned earlier, the problem with the above schema is that when an isbn element is seen the validator cannot tell whether the sequence is over. This is because the element may be validated as the optional isbn element that follows an author, or against the wildcard, which allows any element in a namespace other than the target namespace to appear. This limitation is due to the Unique Particle Attribution Constraint of XML schema.

To make usage of wildcards deterministic in such situations, you can provide delimiters or sentry elements around the wildcard that helps the validator determine when the elements to validate against the wildcard begin and when they end.

The following examples show version 1 of XML schemas that describe a collection of books and an XML document that conforms to the schema.


BOOKS-CORE.XSD
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
       targetNamespace="http://www.example.com/books-core">
  <xs:element name="delimiter">
    <xs:complexType /> 
  </xs:element>
 <xs:element name="end">
   <xs:complexType /> 
 </xs:element>
 <xs:attribute name="mustUnderstand" type="xs:boolean" />
</xs:schema>

BOOKS.XSD
<xs:schema elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.example.com/books"
           xmlns:b="http://www.example.com/books"
           xmlns:bc="http://www.example.com/books-core">
  <xs:import namespace="http://www.example.com/books-core"
             schemaLocation="books-core.xsd" />
  <xs:element name="books">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" type="b:bookType"
                    maxOccurs="unbounded" />
     </xs:sequence>
     <xs:attribute name="version" type="xs:string" />
   </xs:complexType>
 </xs:element>
 <xs:complexType name="bookType">
   <xs:sequence>
     <xs:element name="title" type="xs:string" />
     <xs:element name="author" type="xs:string" />
     <xs:element name="isbn" type="xs:string"
                 minOccurs="0" />
     <xs:sequence minOccurs="0" maxOccurs="1">
       <xs:sequence minOccurs="0" maxOccurs="unbounded">     
         <xs:element ref="bc:delimiter" /> 
         <xs:any namespace="##targetNamespace ##local"
                 minOccurs="0" maxOccurs= "unbounded"/> 
       </xs:sequence>
       <xs:element ref="bc:end" />
     </xs:sequence>
     <xs:group ref="b:extensionGroup" minOccurs="0" />
   </xs:sequence>
   <xs:attribute name="publisher" type="xs:string" />
</xs:complexType>
<xs:group name="extensionGroup">
  <xs:sequence>
    <xs:element name="extensions">
      <xs:complexType>
        <xs:sequence>
          <xs:any namespace="##other" minOccurs="0"
                  maxOccurs="unbounded"
                  processContents="lax" /> 
        </xs:sequence>
      </xs:complexType>
    </xs:element>
  </xs:sequence>
</xs:group>
</xs:schema>

BOOKS.XML
<books version="1.0" xmlns="http://www.example.com/books">
  <book publisher="IDG books">
    <title>XML Bible</title>
    <author>Elliotte Rusty Harold</author>
  </book>
  <book publisher="Addison-Wesley">
    <title>The Mythical Man Month</title>
    <author>Frederick Brooks</author>
    <isbn>0-373-70708-8</isbn>
  </book>
  <book publisher="WROX">
    <title>Professional XSLT 2nd Edition</title>
    <author>Michael Kay</author>
    <extensions>
      <price xmlns="http://www.example.com/book/extensions">
        24.99
      </price>
    </extensions>
  </book>
</books>

The schema for the http://www.example.com/books/ namespace describes the books element, which can contain one or more book elements that subsequently have required author and title elements, an optional isbn element, plus a publisher attribute. Each book element also has an extensibility point within which a delimiter element followed by zero or more elements from the target namespace can occur multiple times.

The end of the extensibility point is bounded by an end element. The content model of the book element also allows for zero or more elements from any namespace besides the target namespace of the schema to appear after the end element. The schema for the http://www.example.com/books-core namespace contains a mustUnderstand attribute, which must be added on extension elements or elements from a future version of the format. The delimiter and end elements are also defined in this schema.

In the next version of the format, it is decided to add an additional edition-number element to the content model of the book element. Below is the schema for version 2 of the format, along with a sample document.


BOOKS.XSD
<xs:schema elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.example.com/books"
           xmlns:b="http://www.example.com/books"
           xmlns:bc="http://www.example.com/books-core">
  <xs:import namespace="http://www.example.com/books-core"
             schemaLocation="books-core.xsd" />
    <xs:element name="books">
      <xs:complexType>
        <xs:sequence>
          <xs:element name="book" type="b:bookType" 
                      maxOccurs="unbounded" />
        </xs:sequence>
        <xs:attribute name="version" type="xs:string" />
      </xs:complexType>
    </xs:element>
    <xs:complexType name="bookType">
      <xs:sequence>
        <xs:element name="title" type="xs:string" />
        <xs:element name="author" type="xs:string" />
        <xs:element name="isbn" type="xs:string"
                    minOccurs="0" />
        <xs:sequence minOccurs="0" maxOccurs="1">
          <xs:element ref="bc:delimiter" /> 
          <xs:element name="edition-number"
                      type="xs:positiveInteger"
                       minOccurs="0" /> 
          <xs:sequence minOccurs="0" maxOccurs="unbounded"> 
            <xs:element ref="bc:delimiter" /> 
            <xs:any namespace="##targetNamespace ##local"
                    minOccurs="0" maxOccurs="unbounded"/> 
          </xs:sequence>
          <xs:element ref="bc:end" />
        </xs:sequence>
        <xs:any namespace="##other" minOccurs="0"
                maxOccurs="unbounded"
                processContents="lax" />
      </xs:sequence>
      <xs:attribute name="publisher" type="xs:string" />
    </xs:complexType>
  </xs:schema>

BOOKS.XML
<books version="2.0" xmlns="http://www.example.com/books"
       xmlns:p="http://www.example.com/book/extensions"
       xmlns:bc="http://www.example.com/book-core">
  <book publisher="HCI">

    <title>A Child Called It</title>
    <author>Dave Pelzer</author>
    <isbn>1-55874-766-9</isbn>
    <bc:delimiter />
      <edition-number>1<edition-number>
    <bc:end />
    <extensions>
      <p:price>9.95</p:price>
    </extensions>
  </book>
</books>

Unlike the New constructs in a new namespace approach, this approach keeps all the core components of the format in a single namespace and is forward compatible as well as backward compatible. It should be noted that forward compatibility is dependent on not adding any new required constructs in future versions. Another benefit of this approach is that it obviates the need for having an explicit mustUnderstand construct since you can simply specify that if a consumer encounters any unknown element from the target namespace of the format then it should result in a fatal error.

The primary drawback of using version extensibility points is that it makes both schemas and XML instances more verbose, and therefore potentially more confusing.