XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

W3C XML Schema Design Patterns: Dealing With Change
by Dare Obasanjo | Pages: 1, 2

Gaining Flexibility from Substitution Groups and Abstract Elements

W3C XML Schema borrows a number of concepts from object oriented programming including the notions of abstract types, type substitutions, and polymorphism. Abstract elements and substitution groups allow schema authors to create or utilize schemas which define generic base types and extend these types without affecting the original schema.

A substitution group contains elements that can appear interchangeably in an XML instance document in a manner reminiscent of subtype polymorphism in OOP languages. Elements in a substitution group must be of the same type or have types that are members of the same type hierarchy. An element declaration that is marked abstract indicates that a member of its substitution group must appear in its place in the instance document. The following schema defines an abstract element; it's followed bya another schema which defines an element which may be substituted for the abstract element and whose type is derived from that of the abstract element.


      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
	  elementFormDefault="qualified"> 

       <xs:element name="Customers">
        <xs:complexType>
	 <xs:sequence>
	  <xs:element ref="cust:Customer" maxOccurs="unbounded" />
	 </xs:sequence>
	</xs:complexType>
       </xs:element>

       <xs:element name="Customer" type="cust:CustomerType" abstract="true" /> 

        <xs:complexType name="CustomerType" > 
	 <xs:sequence>
	   <xs:element ref="cust:FirstName" />
	   <xs:element ref="cust:LastName" />	
	 </xs:sequence>         
	 <xs:attribute name="customerID" type="xs:integer" />
	</xs:complexType>

       <xs:element name="FirstName" type="xs:string" />
       <xs:element name="LastName" type="xs:string"  />  
       <xs:element name="PhoneNumber" type="xs:string" />

      </xs:schema> 
     cust.xsd

      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          xmlns:addr="urn:xmlns:25hoursaday-com:address" 
          targetNamespace="urn:xmlns:25hoursaday-com:address" 
	  elementFormDefault="qualified"> 

       <xs:import namespace="urn:xmlns:25hoursaday-com:customer"
	    schemaLocation="cust.xsd"/> 

       <xs:element name="MyCustomer" substitutionGroup="cust:Customer"
	    type="addr:MyCustomerType"  /> 

        <xs:complexType name="MyCustomerType" > 
	 <xs:complexContent>
          <xs:extension base="cust:CustomerType">
	   <xs:sequence>
	    <xs:element ref="cust:PhoneNumber" /> 
	    <xs:element ref="addr:Address" />	 
   	    <xs:element ref="addr:City" />	 
   	    <xs:element ref="addr:State" />	 
   	    <xs:element ref="addr:Zip" />	 
 	   </xs:sequence>     
          </xs:extension>
         </xs:complexContent>
	</xs:complexType> 

	<xs:element name="Address" type="xs:string" />
	<xs:element name="City" type="xs:string" />
	<xs:element name="State" type="xs:string" fixed="WA" />	

	<xs:element name="Zip">
	 <xs:simpleType>
	  <xs:restriction base="xs:token" >
	   <xs:pattern value="[0-9]{5}(-[0-9]{4})?"/>
	  </xs:restriction>
	 </xs:simpleType>
	</xs:element>

     </xs:schema> 
     my_cust.xsd
     

The my_cust.xsd schema contains addr:MyCustomer element declaration which can appear in instance documents in place of cust:Customer elements. Thus the cust:Customers element can have addr:MyCustomer elements as children but not cust:Customer elements, since they are abstract. The following XML instance document can be validated by the my_cust.xsd schema.



     <cust:Customers xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
              xmlns:addr="urn:xmlns:25hoursaday-com:address">
      <addr:MyCustomer customerID="12345" >
       <cust:FirstName>Dare</cust:FirstName>
       <cust:LastName>Obasanjo</cust:LastName>
       <cust:PhoneNumber>425-555-1234</cust:PhoneNumber>
       <addr:Address>2001</addr:Address>
       <addr:City>Redmond</addr:City>
       <addr:State>WA</addr:State>
       <addr:Zip>98052</addr:Zip>
       </addr:MyCustomer>
      </cust:Customers>

     

Note that substitution groups allow vocabularies to be mixed but without the original schema author having to plan for it explicitly. The only consideration a schema author should observe is that elements, which should be able to participate in substitution groups, must be globally declared. However content models derived by restriction or extension are not as open as content models that use wildcards. Although this seems like a disadvantage it isn't; it gives the schema author more control over the appearance and structure of additional content that may appear in valid XML instance documents.

Certain attributes on element declarations can be used to give schema authors more control over element substitutions in instance documents. The block attribute is used to specify whether elements whose types use a certain derivation method can substitute for the element in an instance document. The final attribute is used to specify whether elements whose types use a certain derivation method can declare themselves to be part of the target element's substitution group. More information on what these attributes mean is available in the element declaration section of the W3C XML Schema structures recommendation.

The default values of the block and final attributes for all element declarations in a schema can be specified via the blockDefault and finalDefault attributes of the root xs:schema element.

Runtime Polymorphism via xsi:type and Abstract Types

Abstract types are complex type definitions that have true as the value of their abstract attribute, which indicates elements in an instance document cannot be of that type but instead must be replaced by another type derived either by restriction or extension. The xsi:type attribute can be placed on an element in an XML instance document to change its type as long as the new type is in the same type hierarchy as the original type of the element. Although it is not necessary to use abstract types in conjunction with xsi:type, if a generic format is being created for which most users will create domain specific extensions, then they provide some benefit. The following schema declares an abstract type and an element that uses the abstract type as its type definition; it's followed by a schema which defines two types that derive from the abstract type.


       <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
	  elementFormDefault="qualified"> 

       <xs:element name="Customers">
        <xs:complexType>
	 <xs:sequence>
	  <xs:element ref="cust:Customer" maxOccurs="unbounded" />
	 </xs:sequence>
	</xs:complexType>
       </xs:element>

       <xs:element name="Customer" type="cust:CustomerType" /> 

        <xs:complexType name="CustomerType" abstract="true" > 
	 <xs:sequence>
	   <xs:element ref="cust:FirstName" />
	   <xs:element ref="cust:LastName" />
           <xs:element ref="cust:PhoneNumber" minOccurs="0"/>	
	 </xs:sequence>         
	 <xs:attribute name="customerID" type="xs:integer" />
	</xs:complexType>

       <xs:element name="FirstName" type="xs:string" />
       <xs:element name="LastName" type="xs:string"  />  
       <xs:element name="PhoneNumber" type="xs:string" />

      </xs:schema> 
     cust.xsd

     <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
	  elementFormDefault="qualified"> 

     <xs:include schemaLocation="cust.xsd"/> 

     <xs:complexType name="MandatoryPhoneCustomerType" > 
	 <xs:complexContent>
          <xs:restriction base="cust:CustomerType">
	   <xs:sequence>
            <xs:element ref="cust:FirstName" />
	    <xs:element ref="cust:LastName" />
            <xs:element ref="cust:PhoneNumber" minOccurs="1" />
	   </xs:sequence>           
	  </xs:restriction>
         </xs:complexContent>
	</xs:complexType> 
    

        <xs:complexType name="AddressableCustomerType" > 
	 <xs:complexContent>
          <xs:extension base="cust:CustomerType">
	   <xs:sequence>	 
	    <xs:element ref="cust:Address" />	 
   	    <xs:element ref="cust:City" />	 
   	    <xs:element ref="cust:State" />	 
   	    <xs:element ref="cust:Zip" />	 
 	   </xs:sequence>     
          </xs:extension>
         </xs:complexContent>
	</xs:complexType> 

	<xs:element name="Address" type="xs:string" />
	<xs:element name="City" type="xs:string" />
	<xs:element name="State" type="xs:string" fixed="WA" />	

	<xs:element name="Zip">
	 <xs:simpleType>
	  <xs:restriction base="xs:string" >
	   <xs:pattern value="\d{5}(-\d{4})?"/>
	  </xs:restriction>
	 </xs:simpleType>
	</xs:element>

     </xs:schema> 
     
     derived_cust.xsd
     

The Customer elements in the instance document validated by the schemas uses xsi:type to assert their type, even though they are declared as being of the abstract CustomerType in the original schema. Note that both restrictions and extensions of the base type can be the targets of the xsi:type attribute.


     <cust:Customers xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
      <cust:Customer customerID="12345" 
	  xsi:type="cust:MandatoryPhoneCustomerType" >
       <cust:FirstName>Dare</cust:FirstName>
       <cust:LastName>Obasanjo</cust:LastName>
       <cust:PhoneNumber>425-555-1234</cust:PhoneNumber>
      </cust:Customer>
      <cust:Customer customerID="67890" 
	  xsi:type="cust:AddressableCustomerType" >
       <cust:FirstName>John</cust:FirstName>
       <cust:LastName>Smith</cust:LastName>
       <cust:Address>2001</cust:Address>
       <cust:City>Redmond</cust:City>
       <cust:State>WA</cust:State>
       <cust:Zip>98052</cust:Zip>
       </cust:Customer>
      </cust:Customers>
     

Type substitutability and polymorphism will be even more beneficial once type-aware XML processing becomes common, which should occur soon after XQuery 1.0 and XSLT 2.0 are standardized. To further extensibility, applications may combine both abstract types and abstract elements in a type hierarchy by creating abstract elements whose type definition is itself abstract.

Certain attributes on simple and complex type definitions can be used to give schema authors more control over the usage of types in schemas and instance documents. The block attribute is used to specify whether elements whose types use a certain derivation method can substitute for an element whose type is the target type in an instance document. The block attribute also performs a similar function with regards to xsi:type assertions. The final is used to disallow type derivations using one or more specified derivation methods. More information on what these attributes mean on a type declaration is available in the Simple Type Definitions and Complex Type Definition sections of the W3C XML Schema structures recommendation. Also the block attribute on an element declaration specifies whether types that use a particular derivation method are precluded from being used for xsi:type assertions.

The default values of the block and final attributes for all simple and complex type definitions in a schema can be specified via the blockDefault and finalDefault attributes of the root xs:schema element.

Using xs:redefine to Update Type Definitions

W3C XML Schema provides a mechanism for updating a type definition in a process whereby the type effectively derives from itself. xs:redefine, used for redefinition, performs two tasks. The first is to act as an xs:include element by bringing in declarations and definitions from another schema document and making them available as part of the current target namespace. The included declarations and types must be from a schema with the same target namespace, or it must have no namespace. Second, types can be redefined in a manner similar to type derivation with the new definition replacing the old one.

The following shows the included and including schemas, as well as a valid instance document for the schemas.


      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
	  elementFormDefault="qualified"> 

       <xs:element name="Customers">
        <xs:complexType>
	 <xs:sequence>
	  <xs:element ref="cust:Customer" maxOccurs="unbounded" />
	 </xs:sequence>
	</xs:complexType>
       </xs:element>

       <xs:element name="Customer" type="cust:CustomerType" /> 

        <xs:complexType name="CustomerType"> 
	 <xs:sequence>
	   <xs:element ref="cust:FirstName" />
	   <xs:element ref="cust:LastName" />
	 </xs:sequence>         
	 <xs:attribute name="customerID" type="xs:integer" />
	</xs:complexType>

       <xs:element name="FirstName" type="xs:string" />
       <xs:element name="LastName" type="xs:string"  />  
       <xs:element name="PhoneNumber" type="xs:string" />

      </xs:schema> 
     cust.xsd


      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
	  elementFormDefault="qualified"> 

    <xs:redefine schemaLocation="cust.xsd"> 

     <xs:complexType name="CustomerType" > 
	 <xs:complexContent>
          <xs:extension base="cust:CustomerType">
	   <xs:sequence>
            <xs:element ref="cust:PhoneNumber" />
	   </xs:sequence>           
	  </xs:extension>
         </xs:complexContent>
	</xs:complexType> 
    
    </xs:redefine> 
   </xs:schema> 
     redefined_cust.xsd
     
    <cust:Customers xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
      <cust:Customer customerID="12345" >
       <cust:FirstName>Dare</cust:FirstName>
       <cust:LastName>Obasanjo</cust:LastName>
       <cust:PhoneNumber>425-555-1234</cust:PhoneNumber>
      </cust:Customer>
      <cust:Customer customerID="67890" >
       <cust:FirstName>John</cust:FirstName>
       <cust:LastName>Smith</cust:LastName>
        <cust:PhoneNumber>425-555-5555</cust:PhoneNumber>
       </cust:Customer>
     </cust:Customers>
     cust.xml

     

Type redefinition is pervasive because it not only affects elements in the including schema but also those in the included schema as well. Thus all references to the original type in both schemas refer to the redefined type, while the original type definition is overshadowed. This causes a certain degree of fragility because redefined types can adversely interact with derived types and generate conflicts. A common conflict is when a derived type uses extension to add an element or attribute to a type's content model, and a redefinition also adds a similarly named element or attribute to the content model. Such a conflict would have occurred if either of the schemas shown had a type derived from the CustomerType via extension which added a PhoneNumber element of a different type than that in the redefinition.

Further Reading

Acknowledgments

I'd like to thank Priya Lakshminarayanan, Mark Feblowitz, and Jeni Tennison for their help with this article.