XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

W3C XML Schema Design Patterns: Dealing With Change

W3C XML Schema Design Patterns: Dealing With Change

July 03, 2002

W3C XML Schema is one to specify the structure of and constraints on XML documents. As usage of W3C XML Schema has grown, certain usage patterns have become common and this article, the first in a series, will tackle various aspects of the creation and usage of W3C XML Schema. This article will focus on techniques for building schemas which are flexible and which allow for change in underlying data, the schema, or both in a modular manner.

Designing schemas that support data evolution is beneficial in situations where the structure of XML instances may change but still must be validated against the original schema. For example, several entities may share XML documents, the format of which changes over time, but some entities may not receive updated schemas. Or when you must ensure that older versions of an XML document can be validated by newer versions of the schema. Or, perhaps, multiple entities share XML documents that have a similar structure but in which significant domain specific differences. The address.xsd example in the W3C XML Schema Primer describes a situation in which a generic address format exists that can be extended to encompass localized address formats.

Using Wildcards To Create Open Content Models

Related Reading

XML Schema

XML Schema
The W3C's Object-Oriented Descriptions for XML
By Eric van der Vlist

W3C XML Schema provides the wildcards xs:any and xs:anyAttribute which can be used to allow the occurrence of elements and attributes from specified namespaces into a content model. Wildcards allow schema authors to enable extensibility of the content model while maintaining a degree of control over the occurrence of elements and attributes.

The most important attributes for wildcards are namespace and processContents. The namespace attribute is used to specify the namespace from which elements or attributes the wildcard matches can come from. The possible values for the namespace attribute are described in the Namespace Attribute In Any table in the XML Schema Primer. The processContents attribute is used to specify if and how the XML content matched by the wildcard should be validated. The possible values of the processContents attribute are described in WildCard Schema Component section of the W3C XML Schema recommendation.

The following schema uses wildcards to allow valid instances to add elements and attributes unspecified by the schema.


      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
	  elementFormDefault="qualified"> 
       <xs:element name="Customer"> 
        <xs:complexType> 
	 <xs:sequence>
	   <xs:element name="FirstName" type="xs:string" />
	   <xs:element name="LastName" type="xs:string" />
	   <xs:any namespace="##targetNamespace" processContents="strict" 
	   minOccurs="0" maxOccurs="unbounded" />
	   <xs:any namespace="##other" processContents="lax" minOccurs="0"
	    maxOccurs="unbounded" />
	 </xs:sequence>
         <xs:attribute name="customerID" type="xs:integer" />
         <xs:anyAttribute namespace="##any" processContents="skip" />
	</xs:complexType>
       </xs:element> 
       <xs:element name="PhoneNumber" type="xs:string" />
       <xs:element name="FrequentShopper" type="xs:boolean" />
      </xs:schema> 
     

The schema describes a Customer element that contains a FirstName and LastName element in sequence and has a CustomerID attribute. Additionally, two wildcards (xs:any elements) are used to specify that zero or more elements from the urn:xmlns:25hoursaday-com:customer namespace can appear after the customer's name elements followed by zero or more elements from any other namespace. The attribute wildcard (xs:anyAttribute element) specifies that the Customer element can have attributes from any namespace. The wildcards now gives authors the leeway to tailor their XML documents to their specific needs, yet makes the content model rigid enough to satisfy a set of minimal constraints. The following documents are valid against this schema.


     <Customer  customerID="12345" 
	 xmlns="urn:xmlns:25hoursaday-com:customer">
      <FirstName>Dare</FirstName>
      <LastName>Obasanjo</LastName>
     </Customer>

     EXAMPLE 1
     
     <cust:Customer  customerID="12345" numPurchases="17"
	  xmlns:cust="urn:xmlns:25hoursaday-com:customer">
      <cust:FirstName>Dare</cust:FirstName>
      <cust:LastName>Obasanjo</cust:LastName>
      <cust:PhoneNumber>425-555-1234</cust:PhoneNumber>
     </cust:Customer>

     EXAMPLE 2

     <cust:Customer  customerID="12345" numPurchases="17" 
       xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
       xmlns:addr="urn:xmlns:25hoursaday-com:address" >
      <cust:FirstName>Dare</cust:FirstName>
      <cust:LastName>Obasanjo</cust:LastName>
      <cust:PhoneNumber>425-555-1234</cust:PhoneNumber>
      <addr:Address>2001 Beagle Drive</addr:Address>
      <addr:City>Redmond</addr:City>
      <addr:State>WA</addr:State>
      <addr:Zip>98052</addr:Zip>
     </cust:Customer>

     EXAMPLE 3
     

The third example is iteresting because it combines elements from multiple vocabularies and allows users to validate the XML instance using different schemas, none of which complains about elements from a namespace they do not know about. Applications that only know how to process various parts of the document can validate the parts they know while ignoring the rest. If the format of the instance document changes and more customer information makes it into later documents, they are still valid against the original schema as well as any subsequent schemas as long as elements and attributes that were originally declared (in this case FirstName, LastName and customerID) are not removed from the content model.

There are some caveats with using the xs:any wildcard. The first is that xs:any makes it easier to create Non-deterministic content models inadvertently, which may be tricky to find in the schema. The following schema illustrates this problem.

     
      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
	  elementFormDefault="qualified"> 
       <xs:element name="Customer"> 
        <xs:complexType> 
	 <xs:sequence>
	   <xs:element ref="cust:FirstName" />
	   <xs:element ref="cust:LastName" minOccurs="0" />
	   <xs:any namespace="##targetNamespace" 
	   processContents="strict"   />	
	 </xs:sequence>         
	</xs:complexType>
       </xs:element>    
       <xs:element name="FirstName" type="xs:string" />
       <xs:element name="LastName" type="xs:string"  />  
       <xs:element name="PhoneNumber" type="xs:string" />
      </xs:schema> 
     

This schema is non-deterministic because when a LastName element is seen, the validator cannot tell whether the sequence is over because the element may be validated as the optional LastName element that follows a FirstName or against the wildcard which allows any element from the urn:xmlns:25hoursaday-com:customer namespace to appear.

Another caveat for dealing with wildcards is taking care in how one uses the namespace attribute of an xs:any or an xs:anyAttribute. One should take care of the "##other" value for this attribute which the Namespace Attribute In Any table in the XML Schema Primer describes as meaning "any well-formed XML that is not from the target namespace of the type being defined", which is not an entirely accurate description. In fact "##other" really means "any well-formed XML that is not from the target namespace of the type being defined" excluding elements with no namespace..

To create a wildcard that allows elements from any namespace except the target namespace involves using an xs:choice, as in the following schema:



      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
	  elementFormDefault="qualified"> 
       <xs:element name="Customer"> 
        <xs:complexType> 
	 <xs:sequence>
	   <xs:element ref="cust:FirstName" />
	   <xs:element ref="cust:LastName" />
	   <!-- allow any element except those from target namespace -->
	   <xs:choice minOccurs="0" maxOccurs="unbounded" > 
	   <xs:any namespace="##other" processContents="strict"  />	
   	   <xs:any namespace="##local" processContents="strict"  />
	   </xs:choice>
	 </xs:sequence>         
	</xs:complexType>
       </xs:element>    
       <xs:element name="FirstName" type="xs:string" />
       <xs:element name="LastName" type="xs:string"  />  
      </xs:schema>           
     

A choice is used because the "##other" value for the namespace attribute of a wildcard cannot be combined with other values (see XML Representation Summary for the xs:any Element Information Item).

Pages: 1, 2

Next Pagearrow