XML Schema Design Patterns: Is Complex Type Derivation Unnecessary?
W3C XML Schema (WXS) possesses a number of features that mimic object oriented concepts, including type derivation and polymorphism. However real world experience has shown that these features tend to complicate schemas, may have subtle interactions that lead tricky problems, and can often be replaced by other features of WXS. In this article I explore both derivation by restriction and derivation by extension of complex types showing the pros and cons of both techniques, as well as showing alternatives to achieving the same results.
The WXS recommendation is just one of many XML schema languages: DTD, RELAX NG, and XML Data-Reduced. An XML schema is used to describe the structure of an XML document by specifying the valid elements that can occur in a document, the order in which they can occur, as well as constraints on certain aspects of these elements. As usage of XML and XML schema languages has become more widespread, two primary usage scenarios have developed around XML document validation and XML schemas.
In presenting the pros and cons of complex type derivation this article will focus on its effects on these uses of XML schema.
Restriction of complex types involves creating a derived complex type whose content model is a subset its base type's content model. This means that an instance of the derived type should also be a valid instance of the base complex type. Examples of acceptable restrictions to declarations in the content model include
minOccurs="1" &
maxOccurs="unbounded" to minOccurs="2" &
maxOccurs="4"true to falsexs:integer in the base type to
xs:positiveInteger in the derived type) Derivation by restriction is primarily useful in combination with
abstract elements or types. One can create an abstract type that contains
all the characteristics of a number of related content models, then
restrict it to create each of the target content models. This approach is
highlighted in a post
to XML-DEV by Roger Costello , where a PublicationType is
restricted to a MagazineType.
The following schema taken from one of my previous articles uses
derivation by restriction to restrict a complex type which describes a
subscriber to the XML-DEV mailing list to a type that describes me. Any
element that conforms to the DareObasanjo type can also be
validated as an instance of the XML-Deviant type.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- base type -->
<xs:complexType name="XML-Deviant">
<xs:sequence>
<xs:element name="numPosts" type="xs:integer" minOccurs="0" maxOccurs="1" />
<xs:element name="signature" type="xs:string" nillable="true" />
<xs:element name="email" type="xs:string" minOccurs="0" maxOccurs="1" />
</xs:sequence>
<xs:attribute name="firstSubscribed" type="xs:date" use="optional" />
<xs:attribute name="mailReader" type="xs:string"/>
</xs:complexType>
<!-- derived type -->
<xs:complexType name="DareObasanjo">
<xs:complexContent>
<xs:restriction base="XML-Deviant">
<xs:sequence>
<xs:element name="numPosts" type="xs:integer" minOccurs="1" />
<xs:element name="signature" type="xs:string" nillable="false" />
<xs:element name="email" type="xs:string" maxOccurs="0" />
</xs:sequence>
<xs:attribute name="firstSubscribed" type="xs:date" use="required" />
<xs:attribute name="mailReader" type="xs:string" fixed="Microsoft Outlook" />
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:schema>
When a given complex type is to be derived by restriction from another complex type, its content model must be duplicated and refined.
In a previous article in the XML Design Pattern series entitled "Avoiding Complexity" I pointed out why you should very carefully use restriction of complex types with the following admonition:
The rules for derivation by restriction of complex types are described in Section 3.4.6 and Section 3.9.6 of the WXS recommendation. Most bugs in implementations cluster around this feature, and it is quite common to see implementers express exasperation when discussing the various nuances of derivation by restriction in complex types. Further, this kind of derivation does not neatly map to concepts in either object oriented programming or relational database theory, which are the primary producers and consumers of XML data.
For the contract-validation class of users, derivation by restriction
provides little if any benefits over defining content models without
using derivation. The following schema is equivalent to the one in the
previous section if all you're interested in is ensuring that an
XML-Deviant or DareObasanjo element conforms
to the specified content model.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="XML-Deviant">
<xs:sequence>
<xs:element name="numPosts" type="xs:integer" minOccurs="0" maxOccurs="1" />
<xs:element name="signature" type="xs:string" nillable="true" />
<xs:element name="email" type="xs:string" minOccurs="0" maxOccurs="1" />
</xs:sequence>
<xs:attribute name="firstSubscribed" type="xs:date" use="optional" />
<xs:attribute name="mailReader" type="xs:string"/>
</xs:complexType>
<xs:complexType name="DareObasanjo">
<xs:sequence>
<xs:element name="numPosts" type="xs:integer" minOccurs="1" />
<xs:element name="signature" type="xs:string" nillable="false" />
<xs:element name="email" type="xs:string" maxOccurs="0" />
</xs:sequence>
<xs:attribute name="firstSubscribed" type="xs:date" use="required" />
<xs:attribute name="mailReader" type="xs:string" fixed="Microsoft Outlook" />
</xs:complexType>
</xs:schema>
It should be noted that this schema does not enforce the relationship
between the XML-Deviant and DareObasanjo
types. For cases where the subtype relationship must be maintained the
alternative is not satisfactory.
For usage scenarios where a schema is used to create strongly typed
XML, derivation by restriction is problematic. The ability to restrict
optional elements and attributes does not exist in the relational model or
in traditional concepts of type derivation from OOP languages. The example
from the previous section where the email element is optional
in the base type, but cannot appear in the derived type, is incompatible
with the notion of derivation in an object oriented sense, while also
being similarly hard to model using tables in a relational
database. Similarly changing the nillability of a type through derivation
is not a capability that maps to relation or OOP models. On the other
hand, the example that doesn't use derivation by restriction can more
straightforwardly be modeled as classes in an OOP language or as
relational tables. This is important given that it reduces the impedance
mismatch which occurs when attempting to map the contents of an XML
document into a relational database or convert an XML document into an
instance of an OOP class.
Although certain aspects of derivation by restriction do not map well, it's possible to enforce these constraints directly by, for example, always throwing an exception when attempting to access a property or field in a derived type that has been restricted away. However not only is such direct enforcement of WXS constraints unnatural to developers who traditionally use OOP languages, it is unlikely that such conventions would be uniform across all implementations of WXS mapping tools.
Extension of complex types involves creating a derived complex type whose content model is a superset of its base type's content model. Complex type extension involves adding extra attributes or elements to the content model of a base type in the derived type. Elements added via extension are treated as if they were appended to the content model of the base type in sequence. This technique is useful for extracting the common aspects of a set of complex types and then reusing these commonalities via extending the base type definition.
The following schema uses derivation by extension to extend a
complex type which describes a subscriber to the XML-DEV mailing list to
a type that describes me. An instance of the DareObasanjo
type is not necessarily a valid instance of the XML-Deviant
type.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- base type -->
<xs:complexType name="XML-Deviant">
<xs:sequence>
<xs:element name="numPosts" type="xs:integer" minOccurs="0" maxOccurs="1" />
<xs:element name="email" type="xs:string" />
</xs:sequence>
<xs:attribute name="firstSubscribed" type="xs:date" use="optional" />
<xs:attribute name="lastPostDate" type="xs:date" use="optional" />
</xs:complexType>
<!-- derived type -->
<xs:complexType name="DareObasanjo">
<xs:complexContent>
<xs:extension base="XML-Deviant">
<xs:sequence>
<xs:element name="signature" type="xs:string" />
</xs:sequence>
<xs:attribute name="mailReader" type="xs:string" fixed="Microsoft Outlook" />
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:schema>
For users who want to use an XML schema to validate that an XML document conforms to its contract, derivation by extension seems to be an excellent way to componentize and reuse aspects of a schema. Although this seems true at first glance, interactions with other features of WXS such as substitution groups and xsi:type make the usage of derivation by extension problematic. For instance consider the following element declaration:
<xs:element name="xml-deviant" type="XML-Deviant" />
which declares an xml-deviant element whose type is the
XML-Deviant complex type from the schema in the previous
section. Both of the following XML elements are valid against the
xml-deviant element declaration
<xml-deviant firstSubscribed="1999-05-31" >
<email>johndoe@example.com</email>
</xml-deviant>
<xml-deviant xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="DareObasanjo" firstSubscribed="1999-05-31"
mailReader="Microsoft Outlook">
<email>dareo@online.microsoft.com</email>
<signature>XML is about data not objects, that is the zen of XML.</signature>
</xml-deviant>
Although the element declaration explicitly states that the type of
the xml-deviant element is the XML-Deviant
complex type it is possible for an instance to override the declaration
in the schema using the xsi:type attribute as long as the
new type is a subtype of the original type. This means that, by default,
even though an element is successfully validated, it does not
necessarily conform to the content model the consumer believes it's
being validated against. A similar problem is faced when the target
element declaration is designated as the head of a substitution groups.
There are two ways to get around this potential problem with
derivation by extension. The first involves blocking substitution or
type derivation by placing the block or final
attribute on the element declaration or the complex type
declaration. Similarly the blockDefault or
finalDefault attribute can be placed on the
xs:schema element to specify which kind of substitutions or
derivations are disallowed in the schema. The second option involves
using named model groups (xs:group) and attribute groups to
modularize ones schema as opposed to using derivation by
extension. Below is the schema from the previous section rewritten using
named model groups
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="XML-Deviant">
<xs:group ref="XMLDeviantGrp" />
<xs:attributeGroup ref="XMLDeviantAttrGrp" />
</xs:complexType>
<xs:complexType name="DareObasanjo">
<xs:sequence>
<xs:group ref="XMLDeviantGrp" />
<xs:element name="signature" type="xs:string" />
</xs:sequence>
<xs:attributeGroup ref="XMLDeviantAttrGrp" />
<xs:attribute name="mailReader" type="xs:string" fixed="Microsoft Outlook" />
</xs:complexType>
<xs:group name="XMLDeviantGrp">
<xs:sequence>
<xs:element name="numPosts" type="xs:integer" minOccurs="0" maxOccurs="1" />
<xs:element name="email" type="xs:string" minOccurs="0" maxOccurs="1" />
</xs:sequence>
</xs:group>
<xs:attributeGroup name="XMLDeviantAttrGrp">
<xs:attribute name="firstSubscribed" type="xs:date" use="optional" />
<xs:attribute name="lastPostDate" type="xs:date" use="optional" />
</xs:attributeGroup>
</xs:schema>
For usage scenarios that revolve strongly typed XML derivation by extension poses a different but related set of problems. In situations where an XML schema is used as a basis to map between XML and the object oriented or relational models derivation by extension does not prove to problematic. However when processing such strongly typed XML with schema-aware programming languages such as XQuery or XSLT 2.0, certain problems arise. XQuery is a statically typed language meaning that it is expected to detect type related errors at compile type instead of at execution time. The following query is problematic given the previous examples:
for $x in //xml-deviant
return $x/signature
On the one hand, the above expression should lead to a static error
because the xml-deviant element is declared as having
XML-Deviant as its type which does not have a
signature element. On the other hand, since a subtype of
XML-Deviant exists which has a signature
element in the content model and hence could be the target of an
xsi:type directive then this shouldn't be a static
error. Both positions are valid and regardless of which one XQuery has
chosen there will be people who expect the opposite. Developers with a
background in XPath may expect it to work while developers who are
familiar with statically typed languages would recognize it as being
equivalent to the following and thus an error
foreach(xmldeviant b in list) {
yield b.signature; // static type error.
}
To prevent this problem and others related to it is best to avoid using the derivation by extension if the XML document will be processed by an XML Schema aware processing language like XQuery.
Based on the current technological landscape the complex type
derivation features of WXS may add more problems than they solve in the
two most commmon schema use cases. For validation scenarios, derivation
by restriction is of marginal value, while derivation by extension is a
good way to create modularity as well as encourage reuse. Care must
however be taken to consider the ramifications of the various type
substitutability features of WXS (xsi:type and substitution
groups) when using derivation by extension in scenarios revolving around
document validation.
Currently processing and storage of strongly typed XML data is primarily the province of conventional OOP languages and relational databases respectively. This means that certain features of WXS such as derivation by restriction (and to a lesser extent derivation by extension) cause an impedance mismatch between the type system used to describe strongly typed XML and the mechanisms used for processing and storing said XML. Eventually when technologies like XQuery become widespread for processing typed XML and support for XML and W3C XML Schema is integrated into mainstream database products this impedance mismatch will not be important. Until then complex type derivation should be carefully evaluated before being used in situations where W3C XML Schema is primarily being used as a mechanism to create type annotated XML infosets.
I'd like to thank Don Box, Chris Lovett and Erik Meijer for their ideas and feedback while writing this article.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.