Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

W3C XML Schema (WXS) possesses a number of features that mimic object oriented concepts, including type derivation and polymorphism. However real world experience has shown that these features tend to complicate schemas, may have subtle interactions that lead tricky problems, and can often be replaced by other features of WXS. In this article I explore both derivation by restriction and derivation by extension of complex types showing the pros and cons of both techniques, as well as showing alternatives to achieving the same results.

Why Validate XML Documents?

The WXS recommendation is just one of many XML schema languages: DTD, RELAX NG, and XML Data-Reduced. An XML schema is used to describe the structure of an XML document by specifying the valid elements that can occur in a document, the order in which they can occur, as well as constraints on certain aspects of these elements. As usage of XML and XML schema languages has become more widespread, two primary usage scenarios have developed around XML document validation and XML schemas.

Related Reading

XML Schema
The W3C's Object-Oriented Descriptions for XML
By Eric van der Vlist

  1. Describing and enforcing the contract between producers and consumers of XML documents: An XML schema ordinarily serves as a means for consumers and producers of XML to understand the structure of the document being consumed or produced. Schemas are a fairly terse and machine readable way to describe what constitutes a valid XML document according to a particular XML vocabulary. Thus a schema can be thought of as contract between the producer and consumer of an XML document. Typically the consumer ensures that the XML document being received from the producer conforms to the contract by validating the received document against the schema.

    This description covers a wide array of XML usage scenarios from business entities exchanging XML documents to applications that utilize XML configuration files.
  2. Creating the basis for processing and storing typed data represented as XML documents: As XML became popular as a way to represent rigidly structured, strongly typed data, such as the content of a relational database or programming language objects, the ability to to describe the datatypes within an XML document became important. This led to Microsoft's XML Data and XML Data-Reduced schema languages, which ultimately led to WXS. These schema languages are used to convert an input XML infoset into a type annotated infoset (TAI) where element and attribute information items are annotated with a type name.

    WXS describes the creation of a type annotated infoset as a consequence of document validation against a schema. During validation against a WXS, an input XML infoset is converted into a post schema validation infoset (PSVI), which among other things contains type annotations. However practical experience has shown that one does not need to perform full document validation to create type annotated infosets; in general many applications that use XML schemas to create strongly typed XML such as XML<->object mapping technologies do not perform full document validation, since a number of WXS features do not map to concepts in the target domain.

In presenting the pros and cons of complex type derivation this article will focus on its effects on these uses of XML schema.

A Look at Derivation by Restriction of Complex Types

Restriction of complex types involves creating a derived complex type whose content model is a subset its base type's content model. This means that an instance of the derived type should also be a valid instance of the base complex type. Examples of acceptable restrictions to declarations in the content model include

  • Changing an optional attribute to being required
  • Changing the occurrence range of an element so it is a subset of the original occurrence range (e.g. from minOccurs="1" & maxOccurs="unbounded" to minOccurs="2" & maxOccurs="4"
  • Changing the nillability of an element from true to false
  • Changing the type of an element or attribute to a subtype (e.g. going from xs:integer in the base type to xs:positiveInteger in the derived type)
  • Changing an element or attribute to having a fixed value

Derivation by restriction is primarily useful in combination with abstract elements or types. One can create an abstract type that contains all the characteristics of a number of related content models, then restrict it to create each of the target content models. This approach is highlighted in a post to XML-DEV by Roger Costello , where a PublicationType is restricted to a MagazineType.

The following schema taken from one of my previous articles uses derivation by restriction to restrict a complex type which describes a subscriber to the XML-DEV mailing list to a type that describes me. Any element that conforms to the DareObasanjo type can also be validated as an instance of the XML-Deviant type.


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

 <!-- base type -->
 <xs:complexType name="XML-Deviant">
  <xs:sequence>
   <xs:element name="numPosts" type="xs:integer" minOccurs="0" maxOccurs="1" /> 
   <xs:element name="signature" type="xs:string" nillable="true" />
   <xs:element name="email" type="xs:string"  minOccurs="0" maxOccurs="1" />
  </xs:sequence>
  <xs:attribute name="firstSubscribed" type="xs:date" use="optional" />
  <xs:attribute name="mailReader" type="xs:string"/>
 </xs:complexType>

 <!-- derived type --> 
  <xs:complexType name="DareObasanjo">
   <xs:complexContent>
   <xs:restriction base="XML-Deviant">
   <xs:sequence>
    <xs:element name="numPosts" type="xs:integer" minOccurs="1" /> 
    <xs:element name="signature" type="xs:string" nillable="false" />
    <xs:element name="email" type="xs:string"  maxOccurs="0" />
   </xs:sequence>
   <xs:attribute name="firstSubscribed" type="xs:date" use="required" />
   <xs:attribute name="mailReader" type="xs:string" fixed="Microsoft Outlook" />
   </xs:restriction>
   </xs:complexContent>
  </xs:complexType> 

</xs:schema>

When a given complex type is to be derived by restriction from another complex type, its content model must be duplicated and refined.

The Problems with Derivation by Restriction of Complex Types

In a previous article in the XML Design Pattern series entitled "Avoiding Complexity" I pointed out why you should very carefully use restriction of complex types with the following admonition:

The rules for derivation by restriction of complex types are described in Section 3.4.6 and Section 3.9.6 of the WXS recommendation. Most bugs in implementations cluster around this feature, and it is quite common to see implementers express exasperation when discussing the various nuances of derivation by restriction in complex types. Further, this kind of derivation does not neatly map to concepts in either object oriented programming or relational database theory, which are the primary producers and consumers of XML data.

For the contract-validation class of users, derivation by restriction provides little if any benefits over defining content models without using derivation. The following schema is equivalent to the one in the previous section if all you're interested in is ensuring that an XML-Deviant or DareObasanjo element conforms to the specified content model.


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

 <xs:complexType name="XML-Deviant">
  <xs:sequence>
   <xs:element name="numPosts" type="xs:integer" minOccurs="0" maxOccurs="1" /> 
   <xs:element name="signature" type="xs:string" nillable="true" />
   <xs:element name="email" type="xs:string"  minOccurs="0" maxOccurs="1" />
  </xs:sequence>
  <xs:attribute name="firstSubscribed" type="xs:date" use="optional" />
  <xs:attribute name="mailReader" type="xs:string"/>
 </xs:complexType>

  <xs:complexType name="DareObasanjo">
   <xs:sequence>
    <xs:element name="numPosts" type="xs:integer" minOccurs="1" /> 
    <xs:element name="signature" type="xs:string" nillable="false" />
    <xs:element name="email" type="xs:string"  maxOccurs="0" />
   </xs:sequence>
   <xs:attribute name="firstSubscribed" type="xs:date" use="required" />
   <xs:attribute name="mailReader" type="xs:string" fixed="Microsoft Outlook" />
  </xs:complexType> 

</xs:schema>

It should be noted that this schema does not enforce the relationship between the XML-Deviant and DareObasanjo types. For cases where the subtype relationship must be maintained the alternative is not satisfactory.

For usage scenarios where a schema is used to create strongly typed XML, derivation by restriction is problematic. The ability to restrict optional elements and attributes does not exist in the relational model or in traditional concepts of type derivation from OOP languages. The example from the previous section where the email element is optional in the base type, but cannot appear in the derived type, is incompatible with the notion of derivation in an object oriented sense, while also being similarly hard to model using tables in a relational database. Similarly changing the nillability of a type through derivation is not a capability that maps to relation or OOP models. On the other hand, the example that doesn't use derivation by restriction can more straightforwardly be modeled as classes in an OOP language or as relational tables. This is important given that it reduces the impedance mismatch which occurs when attempting to map the contents of an XML document into a relational database or convert an XML document into an instance of an OOP class.

Although certain aspects of derivation by restriction do not map well, it's possible to enforce these constraints directly by, for example, always throwing an exception when attempting to access a property or field in a derived type that has been restricted away. However not only is such direct enforcement of WXS constraints unnatural to developers who traditionally use OOP languages, it is unlikely that such conventions would be uniform across all implementations of WXS mapping tools.

A Look at Derivation by Extension of Complex Types

Extension of complex types involves creating a derived complex type whose content model is a superset of its base type's content model. Complex type extension involves adding extra attributes or elements to the content model of a base type in the derived type. Elements added via extension are treated as if they were appended to the content model of the base type in sequence. This technique is useful for extracting the common aspects of a set of complex types and then reusing these commonalities via extending the base type definition.

The following schema uses derivation by extension to extend a complex type which describes a subscriber to the XML-DEV mailing list to a type that describes me. An instance of the DareObasanjo type is not necessarily a valid instance of the XML-Deviant type.


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

 <!-- base type -->
 <xs:complexType name="XML-Deviant">
  <xs:sequence>
   <xs:element name="numPosts" type="xs:integer" minOccurs="0" maxOccurs="1" /> 
   <xs:element name="email" type="xs:string"  />
  </xs:sequence>
  <xs:attribute name="firstSubscribed" type="xs:date" use="optional" />
  <xs:attribute name="lastPostDate" type="xs:date" use="optional" />
 </xs:complexType>

 <!-- derived type --> 
  <xs:complexType name="DareObasanjo">
   <xs:complexContent>
   <xs:extension base="XML-Deviant">
   <xs:sequence>
    <xs:element name="signature" type="xs:string"  />
   </xs:sequence>
   <xs:attribute name="mailReader" type="xs:string" fixed="Microsoft Outlook" />
   </xs:extension>
   </xs:complexContent>
  </xs:complexType> 

</xs:schema>

The Problems with Derivation by Extension of Complex Types

For users who want to use an XML schema to validate that an XML document conforms to its contract, derivation by extension seems to be an excellent way to componentize and reuse aspects of a schema. Although this seems true at first glance, interactions with other features of WXS such as substitution groups and xsi:type make the usage of derivation by extension problematic. For instance consider the following element declaration:


  <xs:element name="xml-deviant" type="XML-Deviant" />
  

which declares an xml-deviant element whose type is the XML-Deviant complex type from the schema in the previous section. Both of the following XML elements are valid against the xml-deviant element declaration


  <xml-deviant firstSubscribed="1999-05-31" >
   <email>johndoe@example.com</email>
  </xml-deviant>

  <xml-deviant xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                  xsi:type="DareObasanjo" firstSubscribed="1999-05-31" 
		  mailReader="Microsoft Outlook">       
   <email>dareo@online.microsoft.com</email>
   <signature>XML is about data not objects, that is the zen of XML.</signature>
  </xml-deviant>  
  

Although the element declaration explicitly states that the type of the xml-deviant element is the XML-Deviant complex type it is possible for an instance to override the declaration in the schema using the xsi:type attribute as long as the new type is a subtype of the original type. This means that, by default, even though an element is successfully validated, it does not necessarily conform to the content model the consumer believes it's being validated against. A similar problem is faced when the target element declaration is designated as the head of a substitution groups.

There are two ways to get around this potential problem with derivation by extension. The first involves blocking substitution or type derivation by placing the block or final attribute on the element declaration or the complex type declaration. Similarly the blockDefault or finalDefault attribute can be placed on the xs:schema element to specify which kind of substitutions or derivations are disallowed in the schema. The second option involves using named model groups (xs:group) and attribute groups to modularize ones schema as opposed to using derivation by extension. Below is the schema from the previous section rewritten using named model groups

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

 <xs:complexType name="XML-Deviant">
  <xs:group ref="XMLDeviantGrp" />
  <xs:attributeGroup ref="XMLDeviantAttrGrp" />
 </xs:complexType>

  <xs:complexType name="DareObasanjo">  
   <xs:sequence>
    <xs:group ref="XMLDeviantGrp" />
    <xs:element name="signature" type="xs:string"  />
   </xs:sequence>
   <xs:attributeGroup ref="XMLDeviantAttrGrp" />
   <xs:attribute name="mailReader" type="xs:string" fixed="Microsoft Outlook" />   
  </xs:complexType> 

  <xs:group name="XMLDeviantGrp">
   <xs:sequence> 
    <xs:element name="numPosts" type="xs:integer" minOccurs="0" maxOccurs="1" />  
    <xs:element name="email" type="xs:string"  minOccurs="0" maxOccurs="1" /> 
   </xs:sequence> 
  </xs:group>

  <xs:attributeGroup name="XMLDeviantAttrGrp">
   <xs:attribute name="firstSubscribed" type="xs:date" use="optional" />
   <xs:attribute name="lastPostDate" type="xs:date" use="optional" />
  </xs:attributeGroup>

</xs:schema>  

For usage scenarios that revolve strongly typed XML derivation by extension poses a different but related set of problems. In situations where an XML schema is used as a basis to map between XML and the object oriented or relational models derivation by extension does not prove to problematic. However when processing such strongly typed XML with schema-aware programming languages such as XQuery or XSLT 2.0, certain problems arise. XQuery is a statically typed language meaning that it is expected to detect type related errors at compile type instead of at execution time. The following query is problematic given the previous examples:


   for $x in //xml-deviant 
    return $x/signature
  

On the one hand, the above expression should lead to a static error because the xml-deviant element is declared as having XML-Deviant as its type which does not have a signature element. On the other hand, since a subtype of XML-Deviant exists which has a signature element in the content model and hence could be the target of an xsi:type directive then this shouldn't be a static error. Both positions are valid and regardless of which one XQuery has chosen there will be people who expect the opposite. Developers with a background in XPath may expect it to work while developers who are familiar with statically typed languages would recognize it as being equivalent to the following and thus an error


      foreach(xmldeviant b in list) {
                yield b.signature; // static type error.
      }  

To prevent this problem and others related to it is best to avoid using the derivation by extension if the XML document will be processed by an XML Schema aware processing language like XQuery.

Conclusion

Based on the current technological landscape the complex type derivation features of WXS may add more problems than they solve in the two most commmon schema use cases. For validation scenarios, derivation by restriction is of marginal value, while derivation by extension is a good way to create modularity as well as encourage reuse. Care must however be taken to consider the ramifications of the various type substitutability features of WXS (xsi:type and substitution groups) when using derivation by extension in scenarios revolving around document validation.

Currently processing and storage of strongly typed XML data is primarily the province of conventional OOP languages and relational databases respectively. This means that certain features of WXS such as derivation by restriction (and to a lesser extent derivation by extension) cause an impedance mismatch between the type system used to describe strongly typed XML and the mechanisms used for processing and storing said XML. Eventually when technologies like XQuery become widespread for processing typed XML and support for XML and W3C XML Schema is integrated into mainstream database products this impedance mismatch will not be important. Until then complex type derivation should be carefully evaluated before being used in situations where W3C XML Schema is primarily being used as a mechanism to create type annotated XML infosets.

Acknowledgments

I'd like to thank Don Box, Chris Lovett and Erik Meijer for their ideas and feedback while writing this article.


Comment on this articleShare your comments or questions on this article in our forum.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Oldest First
  • Just RELAX!
    2003-10-30 15:17:45 Tom Gaven [Reply]


    Dare,


    Just RELAX!


    This article points out some of the major flaws with complex type derivation and WXS.


    Another important point to mention is the sheer complexity of the WXS syntax. How are you going to extend a schema if you can't comprehend the original?


    Here is an equivalent RNC (Relax NG Compact Syntax) schema for the WXS model group sample presented in the article:


    XMLDeviantGrp = ( firstSubscribed?, lastPostDate?, numPosts?, email? )
    numPosts = element numPosts { xsd:integer }
    email = element email { text }
    firstSubscribed = attribute firstSubscribed { xsd:date}
    lastPostDate = attribute lastPostDate{ xsd:date }


    DareObasanjo = ( XMLDeviantGrp, signature, mailReader )
    signature = element signature { text }
    mailReader = attribute mailReader { "Microsoft Outlook" }


    This schema is a third the size of the WXS equivalent, and is much easier to understand.


    This syntax is so compact that you could create 3 derivations of it in the same time it took you to comprehend the original WXS equivalent.


    No need to be concerned about complexType inheritance, xsi:type, substitution groups, block, final, blockDefault, finalDefault, and sections 3.4.6 and 3.9.6 of the WXS spec!!!


    Just RELAX!!


    Tom


  • Restriction is good
    2003-10-30 16:00:32 Robert Leif [Reply]

    Since restrictions provide strong typing, they are part of good software engineering practice and are used in the production of mission critical software. This may seem odd to those who love their debuggers. However, it is possible to use strong typing and assertions to minimize errors by finding these errors at compile time. This has been well demonstrated with Ada 95 and recently with its derivative SPARK. This will be discussed at ACM SIGAda 2003 (www.sigada.org).
    Robert C. Leif, Ph.D. Meeting Chair

  • The ASN.1 Version of the above
    2003-11-01 15:17:12 Bob Wyman [Reply]

    As long as Tom Gaven has given the RELAX NG version of your WXS, it might be useful to look at the ASN.1 version. It is not as compact as RELAX NG, but is certainly more compact than the WXS version and, in my opinion, much easier to understand. It encodes the same XML as the WXS or RELAX NG versions.


    Xml-deviant DEFINITIONS AUTOMATIC TAGS ::=
    BEGIN


    IMPORTS
    Date
    FROM XSD;


    XML-Deviant ::= [NAME AS "XML-Deviant"] SEQUENCE {
    firstSubscribed [ATTRIBUTE] Date OPTIONAL,
    mailReader [ATTRIBUTE] XSD.String OPTIONAL,
    numPosts INTEGER OPTIONAL,
    signature CHOICE {
    nil BOOLEAN (TRUE) ,
    signature XSD.String} ,
    email XSD.String OPTIONAL}


    XML-Deviant-derivations ::= [USE-TYPE] CHOICE {
    xML-Deviant XML-Deviant,
    dareObasanjo DareObasanjo }


    DareObasanjo ::= SEQUENCE {
    firstSubscribed Date,
    mailReader XSD.String("Microsoft Outlook") OPTIONAL,
    numPosts INTEGER,
    signature XSD.String,
    email XSD.String}
    END




  • Polymorphism is not a problem
    2003-11-05 12:20:03 Will Provost [Reply]

    I don't see why the scenario you describe as a problem for complex-type extension is a shortcoming of WXS. This is just polymorphism, as used throughout OO systems. It's important to be aware of the pitfalls (even that seems too strong a word), but this seems to be a feature of WXS, not a bug or limitation.


    From your <signature>, I can guess that we see this differently, especially the OO aspect! But note that this WXS feature is used to good effect in many Web-service implementations, enabling (in a struggling, poorly-understood form, I'll admit) polymorphism over the SOAP wire. Also consider binding technology such as JAXB, which maps extension to extension -- and polymorphism to polymorphism.