W3C XML Schema Design Patterns: Dealing With Change
July 3, 2002
W3C XML Schema is one to specify the structure of and constraints on XML documents. As usage of W3C XML Schema has grown, certain usage patterns have become common and this article, the first in a series, will tackle various aspects of the creation and usage of W3C XML Schema. This article will focus on techniques for building schemas which are flexible and which allow for change in underlying data, the schema, or both in a modular manner.
Designing schemas that support data evolution is beneficial in situations where the structure of XML instances may change but still must be validated against the original schema. For example, several entities may share XML documents, the format of which changes over time, but some entities may not receive updated schemas. Or when you must ensure that older versions of an XML document can be validated by newer versions of the schema. Or, perhaps, multiple entities share XML documents that have a similar structure but in which significant domain specific differences. The address.xsd example in the W3C XML Schema Primer describes a situation in which a generic address format exists that can be extended to encompass localized address formats.
Using Wildcards To Create Open Content Models
W3C XML Schema provides the wildcards xs:any
and xs:anyAttribute
which can be used to allow the occurrence of elements and attributes from specified
namespaces into a content model. Wildcards allow schema authors to enable extensibility
of
the content model while maintaining a degree of control over the occurrence of elements
and
attributes.
The most important attributes for wildcards are namespace
and
processContents
. The namespace
attribute is used to specify the
namespace from which elements or attributes the wildcard matches can come from. The
possible
values for the namespace
attribute are described in the Namespace Attribute In Any table in the
XML Schema Primer. The processContents
attribute is used to specify if and how
the XML content matched by the wildcard should be validated. The possible values of
the
processContents
attribute are described in WildCard Schema Component
section of the W3C XML Schema recommendation.
The following schema uses wildcards to allow valid instances to add elements and attributes unspecified by the schema.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:cust="urn:xmlns:25hoursaday-com:customer" targetNamespace="urn:xmlns:25hoursaday-com:customer" elementFormDefault="qualified"> <xs:element name="Customer"> <xs:complexType> <xs:sequence> <xs:element name="FirstName" type="xs:string" /> <xs:element name="LastName" type="xs:string" /> <xs:any namespace="##targetNamespace" processContents="strict" minOccurs="0" maxOccurs="unbounded" /> <xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="customerID" type="xs:integer" /> <xs:anyAttribute namespace="##any" processContents="skip" /> </xs:complexType> </xs:element> <xs:element name="PhoneNumber" type="xs:string" /> <xs:element name="FrequentShopper" type="xs:boolean" /> </xs:schema>
The schema describes a Customer
element that contains a
FirstName
and LastName
element in sequence and has a
CustomerID
attribute. Additionally, two wildcards (xs:any
elements) are used to specify that zero or more elements from the
urn:xmlns:25hoursaday-com:customer
namespace can appear after the customer's
name elements followed by zero or more elements from any other namespace. The attribute
wildcard (xs:anyAttribute
element) specifies that the Customer
element can have attributes from any namespace. The wildcards now gives authors the
leeway
to tailor their XML documents to their specific needs, yet makes the content model
rigid
enough to satisfy a set of minimal constraints. The following documents are valid
against
this schema.
<Customer customerID="12345" xmlns="urn:xmlns:25hoursaday-com:customer"> <FirstName>Dare</FirstName> <LastName>Obasanjo</LastName> </Customer> EXAMPLE 1 <cust:Customer customerID="12345" numPurchases="17" xmlns:cust="urn:xmlns:25hoursaday-com:customer"> <cust:FirstName>Dare</cust:FirstName> <cust:LastName>Obasanjo</cust:LastName> <cust:PhoneNumber>425-555-1234</cust:PhoneNumber> </cust:Customer> EXAMPLE 2 <cust:Customer customerID="12345" numPurchases="17" xmlns:cust="urn:xmlns:25hoursaday-com:customer" xmlns:addr="urn:xmlns:25hoursaday-com:address" > <cust:FirstName>Dare</cust:FirstName> <cust:LastName>Obasanjo</cust:LastName> <cust:PhoneNumber>425-555-1234</cust:PhoneNumber> <addr:Address>2001 Beagle Drive</addr:Address> <addr:City>Redmond</addr:City> <addr:State>WA</addr:State> <addr:Zip>98052</addr:Zip> </cust:Customer> EXAMPLE 3
The third example is iteresting because it combines elements from multiple vocabularies
and allows users to validate the XML instance using different schemas, none of which
complains about elements from a namespace they do not know about. Applications that
only
know how to process various parts of the document can validate the parts they know
while
ignoring the rest. If the format of the instance document changes and more customer
information makes it into later documents, they are still valid against the original
schema
as well as any subsequent schemas as long as elements and attributes that were originally
declared (in this case FirstName
, LastName
and
customerID
) are not removed from the content model.
There are some caveats with using the xs:any
wildcard. The first is that
xs:any
makes it easier to create Non-deterministic content models inadvertently, which may be tricky to find in the
schema. The following schema illustrates this problem.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:cust="urn:xmlns:25hoursaday-com:customer" targetNamespace="urn:xmlns:25hoursaday-com:customer" elementFormDefault="qualified"> <xs:element name="Customer"> <xs:complexType> <xs:sequence> <xs:element ref="cust:FirstName" /> <xs:element ref="cust:LastName" minOccurs="0" /> <xs:any namespace="##targetNamespace" processContents="strict" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="FirstName" type="xs:string" /> <xs:element name="LastName" type="xs:string" /> <xs:element name="PhoneNumber" type="xs:string" /> </xs:schema>
This schema is non-deterministic because when a LastName
element is seen, the
validator cannot tell whether the sequence is over because the element may be validated
as
the optional LastName
element that follows a FirstName
or against
the wildcard which allows any element from the
urn:xmlns:25hoursaday-com:customer
namespace to appear.
Another caveat for dealing with wildcards is taking care in how one uses the
namespace
attribute of an xs:any
or an
xs:anyAttribute
. One should take care of the "##other" value for this
attribute which the Namespace Attribute In
Any table in the XML Schema Primer describes as meaning "any well-formed XML that is
not from the target namespace of the type being defined", which is not an entirely
accurate
description. In fact "##other" really means "any well-formed XML that is not from
the target
namespace of the type being defined" excluding elements with no namespace..
To create a wildcard that allows elements from any namespace except the target namespace
involves using an xs:choice
, as in the following schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:cust="urn:xmlns:25hoursaday-com:customer" targetNamespace="urn:xmlns:25hoursaday-com:customer" elementFormDefault="qualified"> <xs:element name="Customer"> <xs:complexType> <xs:sequence> <xs:element ref="cust:FirstName" /> <xs:element ref="cust:LastName" /> <!-- allow any element except those from target namespace --> <xs:choice minOccurs="0" maxOccurs="unbounded" > <xs:any namespace="##other" processContents="strict" /> <xs:any namespace="##local" processContents="strict" /> </xs:choice> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="FirstName" type="xs:string" /> <xs:element name="LastName" type="xs:string" /> </xs:schema>
A choice is used because the "##other" value for the namespace
attribute of a
wildcard cannot be combined with other values (see XML Representation Summary for
the xs:any Element Information Item).
Gaining Flexibility from Substitution Groups and Abstract Elements
W3C XML Schema borrows a number of concepts from object oriented programming including the notions of abstract types, type substitutions, and polymorphism. Abstract elements and substitution groups allow schema authors to create or utilize schemas which define generic base types and extend these types without affecting the original schema.
A substitution group contains elements that can appear interchangeably in an XML instance document in a manner reminiscent of subtype polymorphism in OOP languages. Elements in a substitution group must be of the same type or have types that are members of the same type hierarchy. An element declaration that is marked abstract indicates that a member of its substitution group must appear in its place in the instance document. The following schema defines an abstract element; it's followed bya another schema which defines an element which may be substituted for the abstract element and whose type is derived from that of the abstract element.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:cust="urn:xmlns:25hoursaday-com:customer" targetNamespace="urn:xmlns:25hoursaday-com:customer" elementFormDefault="qualified"> <xs:element name="Customers"> <xs:complexType> <xs:sequence> <xs:element ref="cust:Customer" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Customer" type="cust:CustomerType" abstract="true" /> <xs:complexType name="CustomerType" > <xs:sequence> <xs:element ref="cust:FirstName" /> <xs:element ref="cust:LastName" /> </xs:sequence> <xs:attribute name="customerID" type="xs:integer" /> </xs:complexType> <xs:element name="FirstName" type="xs:string" /> <xs:element name="LastName" type="xs:string" /> <xs:element name="PhoneNumber" type="xs:string" /> </xs:schema> cust.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:cust="urn:xmlns:25hoursaday-com:customer" xmlns:addr="urn:xmlns:25hoursaday-com:address" targetNamespace="urn:xmlns:25hoursaday-com:address" elementFormDefault="qualified"> <xs:import namespace="urn:xmlns:25hoursaday-com:customer" schemaLocation="cust.xsd"/> <xs:element name="MyCustomer" substitutionGroup="cust:Customer" type="addr:MyCustomerType" /> <xs:complexType name="MyCustomerType" > <xs:complexContent> <xs:extension base="cust:CustomerType"> <xs:sequence> <xs:element ref="cust:PhoneNumber" /> <xs:element ref="addr:Address" /> <xs:element ref="addr:City" /> <xs:element ref="addr:State" /> <xs:element ref="addr:Zip" /> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> <xs:element name="Address" type="xs:string" /> <xs:element name="City" type="xs:string" /> <xs:element name="State" type="xs:string" fixed="WA" /> <xs:element name="Zip"> <xs:simpleType> <xs:restriction base="xs:token" > <xs:pattern value="[0-9]{5}(-[0-9]{4})?"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:schema> my_cust.xsd
The my_cust.xsd schema contains addr:MyCustomer
element declaration
which can appear in instance documents in place of cust:Customer
elements. Thus
the cust:Customers
element can have addr:MyCustomer
elements as
children but not cust:Customer
elements, since they are abstract. The following
XML instance document can be validated by the my_cust.xsd schema.
<cust:Customers xmlns:cust="urn:xmlns:25hoursaday-com:customer" xmlns:addr="urn:xmlns:25hoursaday-com:address"> <addr:MyCustomer customerID="12345" > <cust:FirstName>Dare</cust:FirstName> <cust:LastName>Obasanjo</cust:LastName> <cust:PhoneNumber>425-555-1234</cust:PhoneNumber> <addr:Address>2001</addr:Address> <addr:City>Redmond</addr:City> <addr:State>WA</addr:State> <addr:Zip>98052</addr:Zip> </addr:MyCustomer> </cust:Customers>
Note that substitution groups allow vocabularies to be mixed but without the original schema author having to plan for it explicitly. The only consideration a schema author should observe is that elements, which should be able to participate in substitution groups, must be globally declared. However content models derived by restriction or extension are not as open as content models that use wildcards. Although this seems like a disadvantage it isn't; it gives the schema author more control over the appearance and structure of additional content that may appear in valid XML instance documents.
Certain attributes on element declarations can be used to give schema authors more
control
over element substitutions in instance documents. The block
attribute is used
to specify whether elements whose types use a certain derivation method can substitute
for
the element in an instance document. The final
attribute is used to specify
whether elements whose types use a certain derivation method can declare themselves
to be
part of the target element's substitution group. More information on what these attributes
mean is available in the element
declaration section of the W3C XML Schema structures recommendation.
The default values of the block
and final
attributes for all
element declarations in a schema can be specified via the blockDefault
and
finalDefault
attributes of the root xs:schema
element.
Runtime Polymorphism via xsi:type and Abstract Types
Abstract types are complex type definitions that have true as the value of their
abstract
attribute, which indicates elements in an instance document cannot
be of that type but instead must be replaced by another type derived either by restriction
or extension. The xsi:type
attribute can be placed on an element in an XML instance document to change its type
as long
as the new type is in the same type hierarchy as the original type of the element.
Although
it is not necessary to use abstract types in conjunction with xsi:type
, if a
generic format is being created for which most users will create domain specific extensions,
then they provide some benefit. The following schema declares an abstract type and
an
element that uses the abstract type as its type definition; it's followed by a schema
which
defines two types that derive from the abstract type.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:cust="urn:xmlns:25hoursaday-com:customer" targetNamespace="urn:xmlns:25hoursaday-com:customer" elementFormDefault="qualified"> <xs:element name="Customers"> <xs:complexType> <xs:sequence> <xs:element ref="cust:Customer" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Customer" type="cust:CustomerType" /> <xs:complexType name="CustomerType" abstract="true" > <xs:sequence> <xs:element ref="cust:FirstName" /> <xs:element ref="cust:LastName" /> <xs:element ref="cust:PhoneNumber" minOccurs="0"/> </xs:sequence> <xs:attribute name="customerID" type="xs:integer" /> </xs:complexType> <xs:element name="FirstName" type="xs:string" /> <xs:element name="LastName" type="xs:string" /> <xs:element name="PhoneNumber" type="xs:string" /> </xs:schema> cust.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:cust="urn:xmlns:25hoursaday-com:customer" targetNamespace="urn:xmlns:25hoursaday-com:customer" elementFormDefault="qualified"> <xs:include schemaLocation="cust.xsd"/> <xs:complexType name="MandatoryPhoneCustomerType" > <xs:complexContent> <xs:restriction base="cust:CustomerType"> <xs:sequence> <xs:element ref="cust:FirstName" /> <xs:element ref="cust:LastName" /> <xs:element ref="cust:PhoneNumber" minOccurs="1" /> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> <xs:complexType name="AddressableCustomerType" > <xs:complexContent> <xs:extension base="cust:CustomerType"> <xs:sequence> <xs:element ref="cust:Address" /> <xs:element ref="cust:City" /> <xs:element ref="cust:State" /> <xs:element ref="cust:Zip" /> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> <xs:element name="Address" type="xs:string" /> <xs:element name="City" type="xs:string" /> <xs:element name="State" type="xs:string" fixed="WA" /> <xs:element name="Zip"> <xs:simpleType> <xs:restriction base="xs:string" > <xs:pattern value="\d{5}(-\d{4})?"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:schema> derived_cust.xsd
The Customer
elements in the instance document validated by the schemas uses
xsi:type
to assert their type, even though they are declared as being of the
abstract CustomerType
in the original schema. Note that both restrictions and
extensions of the base type can be the targets of the xsi:type
attribute.
<cust:Customers xmlns:cust="urn:xmlns:25hoursaday-com:customer" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <cust:Customer customerID="12345" xsi:type="cust:MandatoryPhoneCustomerType" > <cust:FirstName>Dare</cust:FirstName> <cust:LastName>Obasanjo</cust:LastName> <cust:PhoneNumber>425-555-1234</cust:PhoneNumber> </cust:Customer> <cust:Customer customerID="67890" xsi:type="cust:AddressableCustomerType" > <cust:FirstName>John</cust:FirstName> <cust:LastName>Smith</cust:LastName> <cust:Address>2001</cust:Address> <cust:City>Redmond</cust:City> <cust:State>WA</cust:State> <cust:Zip>98052</cust:Zip> </cust:Customer> </cust:Customers>
Type substitutability and polymorphism will be even more beneficial once type-aware XML processing becomes common, which should occur soon after XQuery 1.0 and XSLT 2.0 are standardized. To further extensibility, applications may combine both abstract types and abstract elements in a type hierarchy by creating abstract elements whose type definition is itself abstract.
Certain attributes on simple and complex type definitions can be used to give schema
authors more control over the usage of types in schemas and instance documents. The
block
attribute is used to specify whether elements whose types use a certain
derivation method can substitute for an element whose type is the target type in an
instance
document. The block
attribute also performs a similar function with regards to
xsi:type
assertions. The final
is used to disallow type
derivations using one or more specified derivation methods. More information on what
these
attributes mean on a type declaration is available in the Simple Type
Definitions and Complex Type Definition sections of the W3C XML Schema structures recommendation.
Also the block
attribute on an element declaration specifies whether types that
use a particular derivation method are precluded from being used for xsi:type
assertions.
The default values of the block
and final
attributes for all
simple and complex type definitions in a schema can be specified via the
blockDefault
and finalDefault
attributes of the root
xs:schema
element.
Using xs:redefine to Update Type Definitions
W3C XML Schema provides a mechanism for updating a type definition in a process whereby
the type effectively derives from itself. xs:redefine
, used for redefinition,
performs two tasks. The first is to act as an xs:include
element by bringing in
declarations and definitions from another schema document and making them available
as part
of the current target namespace. The included declarations and types must be from
a schema
with the same target namespace, or it must have no namespace. Second, types can be
redefined
in a manner similar to type derivation with the new definition replacing the old one.
The following shows the included and including schemas, as well as a valid instance document for the schemas.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:cust="urn:xmlns:25hoursaday-com:customer" targetNamespace="urn:xmlns:25hoursaday-com:customer" elementFormDefault="qualified"> <xs:element name="Customers"> <xs:complexType> <xs:sequence> <xs:element ref="cust:Customer" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Customer" type="cust:CustomerType" /> <xs:complexType name="CustomerType"> <xs:sequence> <xs:element ref="cust:FirstName" /> <xs:element ref="cust:LastName" /> </xs:sequence> <xs:attribute name="customerID" type="xs:integer" /> </xs:complexType> <xs:element name="FirstName" type="xs:string" /> <xs:element name="LastName" type="xs:string" /> <xs:element name="PhoneNumber" type="xs:string" /> </xs:schema> cust.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:cust="urn:xmlns:25hoursaday-com:customer" targetNamespace="urn:xmlns:25hoursaday-com:customer" elementFormDefault="qualified"> <xs:redefine schemaLocation="cust.xsd"> <xs:complexType name="CustomerType" > <xs:complexContent> <xs:extension base="cust:CustomerType"> <xs:sequence> <xs:element ref="cust:PhoneNumber" /> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> </xs:redefine> </xs:schema> redefined_cust.xsd <cust:Customers xmlns:cust="urn:xmlns:25hoursaday-com:customer" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <cust:Customer customerID="12345" > <cust:FirstName>Dare</cust:FirstName> <cust:LastName>Obasanjo</cust:LastName> <cust:PhoneNumber>425-555-1234</cust:PhoneNumber> </cust:Customer> <cust:Customer customerID="67890" > <cust:FirstName>John</cust:FirstName> <cust:LastName>Smith</cust:LastName> <cust:PhoneNumber>425-555-5555</cust:PhoneNumber> </cust:Customer> </cust:Customers> cust.xml
Type redefinition is pervasive because it not only affects elements in the
including schema but also those in the included schema as well. Thus all references
to the
original type in both schemas refer to the redefined type, while the original type
definition is overshadowed. This causes a certain degree of fragility because redefined
types can adversely interact with derived types and generate conflicts. A common conflict
is
when a derived type uses extension to add an element or attribute to a type's content
model,
and a redefinition also adds a similarly named element or attribute to the content
model.
Such a conflict would have occurred if either of the schemas shown had a type derived
from
the CustomerType
via extension which added a PhoneNumber
element
of a different type than that in the redefinition.
Further Reading
Acknowledgments
I'd like to thank Priya Lakshminarayanan, Mark Feblowitz, and Jeni Tennison for their help with this article.