XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

W3C XML Schema Design Patterns: Avoiding Complexity
by Dare Obasanjo | Pages: 1, 2, 3, 4

Why You Should Very Carefully Use Restriction Of Complex Types

Restriction of complex types involves creating a derived complex type whose content model is a subset of the base type.

The parts of the WXS spec which describe derivation by restriction in complex types (Section 3.4.6 and Section 3.9.6) are generally considered to be its most complex parts. Most bugs in implementations cluster around this feature, and it is quite common to see implementers express exasperation when discussing the various nuances of derivation by restriction in complex types. Further, this kind of derivation does not neatly map to concepts in either object oriented programming or relational database theory, which are the primary producers and consumers of XML data. This is the exact opposite of the situation with derivation by extension of complex types.

Another challenge in using derivation by restriction of complex types arises from the way in which restrictions are declared: when a given complex type is to be derived by restriction from another complex type, its content model must be duplicated and refined. Duplication of a definition replicates definitions, possibly down a long derivation chain, so any modification to an ancestor type must be manually propagated down the derivation tree. Furthermore, such replication cannot cross namespace boundaries -- deriving ns2:SlowCar from ns1:Car may not work if ns2:SlowCar's has a child element, ns2:MaxSpeed, because it cannot be correctly derived from ns1:Car's child element ns1:MaxSpeed.

The following schema uses derivation by restriction to restrict a complex type, which describes a subscriber to the XML-DEV mailing list, to a type that describes me. Any element that conforms to the DareObasanjo type can also be validated as an instance of the XML-Deviant type.


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema>

 <!-- base type -->
 <xs:complexType name="XML-Deviant">
  <xs:sequence>
   <xs:element name="numPosts" type="xs:integer" minOccurs="0"
maxOccurs="1" /> 
   <xs:element name="signature" type="xs:string" nillable="true" />
  </xs:sequence>
  <xs:attribute name="firstSubscribed" type="xs:date" use="optional" />
  <xs:attribute name="mailReader" type="xs:string"/>
 </xs:complexType>

 <!-- derived type --> 
  <xs:complexType name="DareObasanjo">
   <xs:complexContent>
   <xs:restriction base="XML-Deviant">
   <xs:sequence>
    <xs:element name="numPosts" type="xs:integer" minOccurs="1" /> 
    <xs:element name="signature" type="xs:string" nillable="false" />
   </xs:sequence>
   <xs:attribute name="firstSubscribed" type="xs:date" use="required" />
   <xs:attribute name="mailReader" type="xs:string" fixed="Microsoft Outlook" />
   </xs:restriction>
   </xs:complexContent>
  </xs:complexType> 

</xs:schema>

Derivation by restriction of complex types is a multifaceted feature that is useful in situations where secondary types need to conform to a generic primary type, but also add their own constraints which go beyond those of the primary type. However, its extreme complexity requires that it be used only by those who have a firm grasp of WXS.

Why You Should Carefully Use Abstract Types

Borrowing a concept from OOP languages like C# and Java, both element declarations and complex type definitions can be made abstract. An abstract element declaration cannot be used to validate an element in an XML instance document and can only appear in content models via substitution. An abstract complex type definition similarly cannot be used to validate an element in an XML instance document; but it can be used as the the abstract parent of an element's derived type or in cases where the element's type is overridden in the instance using xsi:type.

Abstract complex types and element declarations are useful for creating generic base types which contain information common to a set of types (such as Shape vs. Circle or Square), yet the definition is not deemed "complete" unless further derivation (extension or restriction) has been applied. While this feature is not complicated to use, some implications of its use are subtle and complex. Abstract types should be used with care.

Do Use Wildcards to Provide Well Defined Points Of Extensibility

WXS provides the wildcards xs:any and xs:anyAttribute which can be used to allow the occurrence of elements and attributes from specified namespaces into a content model. Wildcards allow schema authors to enable extensibility of the content model while maintaining a degree of control over the occurrence of elements and attributes. A good discussion of the benefits of using wildcards is available in an XML.com article, "W3C XML Schema Design Patterns: Dealing With Change".

Cautious schema authors, concerned with the problems posed by type derivation, may choose to block attempts at type derivation using the final attribute on complex type definitions and element declarations (similar to sealed in C# and final in Java). They may then choose to allow extensibility at specific parts of the content model by using wildcards. This gives schema authors more control over the content models they define and may reduce some of the problems with various aspects of complex type derivation (specifically derivation by extension).

It should be noted that wildcards sometimes cause problems with non-determinism that violate the Unique Particle Attribution rule if used improperly. The following schema causes such a problem.


<?xml version="1.0" encoding="utf-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 targetNamespace="http://www.example.com/fruit/"
 elementFormDefault="qualified">

<xs:complexType name="myKitchen">
        <xs:choice maxOccurs="unbounded">
              <xs:any processContents="skip" />
              <xs:element name="apple" type="xs:string"/>
              <xs:element name="cherry" type="xs:string"/>            
        </xs:choice>
</xs:complexType>

</xs:schema>

The content model of the myKitchen type is such that it can contain one or more apple, cherry, or any other element. However, during validation, if an apple element is seen, the compiler cannot tell whether it should be validated against the wildcard or the apple element declaration.

There are subtle but potentially profound ramifications to the selection of both the namespace attribute and the processContents attribute. Overly restrictive values can impede extensibility; overly loose values can open the schema up to abuse. Controlling the supported namespaces for a wildcard can also be bewildering, especially when the set of allowable namespaces is subject to change.

Do Not Use Group or Type Redefinition

Redefinition is a feature of WXS that allows you to change the meaning of an included type or group definition. Using xs:redefine, schema authors can include type or group definitions from schema documents and alter these definitions in a pervasive manner. Redefinition is pervasive because it not only affects type or group definitions in the including schema but also those in the included schema as well. Thus all references to the original type or group in both schemas refer to the redefined type, while the original definition is overshadowed. This leads to the problems pointed out in "W3C XML Schema Design Patterns: Dealing With Change":

This causes a certain degree of fragility because redefined types can adversely interact with derived types and generate conflicts. A common conflict is when a derived type uses extension to add an element or attribute to a type's content model, and a redefinition also adds a similarly named element or attribute to the content model

A major problem with type redefinition is that unlike type derivation it cannot be prevented by using the block or final attributes. Thus any schema can have its types redefined in a pervasive manner, thus altering their semantics completely. It is advisable to avoid this feature due to the potential conflicts it can cause.

Many schema authors attempt to use type redefinition to increase the value space of an enumeration but this does not work. The only way to increase the number of values accepted by an enumeration used as a base type is to create a union. However, those additional values are only available to applications of the resulting union type, not for the applications of the original base type. Also note that chained redefinitions (redefining a redefine) can be problematic, resulting in unexpected definition clashes.

Conclusion

The WXS recommendation is a complex specification because it attempts to solve complex problems. One can reduce its burdens by utilizing its simpler aspects. Schema authors should ensure that their schemas validate in multiple schema processors. Schemas are an important facilitator of interoperability. It's foolish to depend on the nuances of a specific implementation and inadvertently give up this interoperability.

Acknowledgments

I'd like to thank Priya Lakshminarayanan and Mark Feblowitz for their help with this article.