XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

W3C XML Schema Made Simple

June 06, 2001

Overview

It's easy to learn and use W3C XML Schema once you know how to avoid the pitfalls. You should at least learn the following things.

  • Do use element declarations, attribute groups, model groups, and simple types.
  • Do use XML namespaces as much as possible. Learn the correct way to use them.
  • Do not try to be a master of XML Schema. It would take months.
  • Do not use complex types (why?), attribute declarations (why?), or notations (why?).
  • Do not use local declarations (why?).
  • Do not use substitution groups (why?).
  • Do not use a schema without the targetNamespace attribute (aka chameleon schema.) (why?)

You won't lose anything by following these guidelines, as the rest of this article demonstrates.

Too long to remember? Then try the one-line version:

Consider W3C XML Schema as DTD + datatype + namespace

The rest of this article justifies these recommendations. At times it gets a bit hairy, so if you're willing to take my word for it, you can stop reading now.

Motivation for this Article

Several similar documents on XML Schema are already available. I discovered, however, that they're written by brilliant people who always drive things to the limit. They simply can't stop inventing cool tricks that even working group members can't imagine, and XML Schema is their new favorite toy.

This document is for those who want to use W3C XML Schema for business, and for those who are at a loss how to use it. The goal is to provide a set of solid guidelines about what you should do and what you shouldn't do.

Why You Should Avoid Complex Types

If you don't know what a complex type is, then don't let it trouble you. Whatever small gain this functionality offers is vastly outweighed by its complexity. Furthermore, you won't lose anything by not using complex types: if a schema can be written by using complex types, then you can always write it without complex types. To be precise, you can always write it without understanding complex types, but unfortunately you have to type <complexType> elements.

Just consider a <complexType> as something you have to write as a sole child of the <element> element. That is, you write element declarations as follows:

<xs:element name="head">
  <xs:complexType>   <!-- consider this as a place holder -->
    
    <!-- define content model by using model groups. -->
    ...
    <!-- then refer to attribute groups -->
    <xs:attributeGroup ref="head.attributes" />
    
  </xs:complexType>
</xs:element>

Why spend your precious time learning something you don't need? Convinced? Then there is no need to read more.

In short, a complex type is a model group, plus inheritance, minus ease of use. A complex type and a model group are similar in the sense that they are used to define content models. A complex type lacks ease of use because you can't use it from other complex types or model groups. On the other hand, model groups can be used without such restriction.

Inheritance

Inheritance is a complex type's only advantage, but you really don't want to use it. There are two types of inheritance: specifically, extension and restriction.

Extension allows you to append additional elements after the content model of the base type. The following model group reproduces the semantics of the extension, showing that you don't need a complex type to do this.

<xs:group name="extendedType">
  <xs:sequence>
    <xs:group ref="baseType"/>
    
    <!-- append things that you want -->
    ....
  </xs:sequence>
</xs:group>

Restriction allows you to restrict the content model of the base type. But even if you use this functionality, you still have to write the whole content model of the new type. Basically you type the same thing whether you use a complex type or a model group.

What do you get by using the restriction? Error checking. That's it. Validators are supposed to report an error if you fail to make a content model a restricted one. Unfortunately, this is hardly an advantage.

First, strictly enforcing this check is a difficult job for validators. You can look at the part of the spec that defines this constraint. The entire section 3.9.6 is devoted to specifying what is allowed and what is not. There's a strong temptation for developers to skip the enforcement of this constraint because most people won't notice that the check is skipped. At the time of this writing, no validators are known to strictly enforce this constraint.

It's unlikely that your validator is even capable of fully enforcing this constraint, which removes the only advantage of restriction.

Second, even if you write the restriction correctly, you may get an error from your validator. Consider the following example:

Base type:
<xs:all>
  <xs:element name="a" />
  <xs:element name="b" />
  <xs:element name="c" minOccurs="0" />
<xs:all>

New type derived by restriction:
<xs:all>
  <xs:element name="b" />
  <xs:element name="a" />
<xs:all>
Comment on this article Has the author gone too far in throwing out features from XML Schema, or not far enough? Have your say on this important XML technology in our forum.
Post your comments

The latter looks like a proper restriction of the former. In fact, every content model that is accepted by the new type is also accepted by the base type. But W3C XML Schema prohibits this. Specifically, this derivation violates "schema component constraint: particle derivation OK (all:all, sequence:sequence -- recurse)". This is just the tip of the iceberg. If you are interested in this issue, consult the last page of MSL.

None of these problems occur if you use model groups instead of complex types. When it comes to derivation by restriction, a general understanding isn't enough; you need a very detailed understanding of how it works.

Why You Should Avoid Attribute Declarations

To be precise, what you should avoid is global attribute declarations, not local attribute declarations. The following is an example of a global attribute declaration.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://example.com">
  <!-- attribute whose name is foo -->
  <xs:attribute name="foo" type="xs:float" />
  
  <xs:element name="root">
    <xs:complexType>
      <xs:attribute ref="foo" />
    </xs:complexType>
  </xs:element>
</xs:schema>

This schema does not accept the following instance.

<root xmlns="http://example.com" foo="5.12"/>

Rather, it accepts the following instance, which likely isn't what you want:

<root xmlns="http://example.com"
       ns:foo="5.12" xmlns:ns="http://example.com" />

Attribute groups do not have this problem. So instead of using an attribute declaration, you should use an attribute group.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://example.com">
  <xs:attributeGroup name="root.attributes">
    <!-- attribute whose name is foo -->
    <xs:attribute name="foo" type="xs:float" />
  <xs:attributeGroup>
  
  <xs:element name="root">
    <xs:complexType>
      <!-- content model -->
      ...
      <xs:attributeGroup ref="root.attributes" />
    </xs:complexType>
  </xs:element>
</xs:schema>

An attribute group can refer to other attribute groups. In this way, you can write common attributes in one attribute group, then refer to it from others.

Pages: 1, 2

Next Pagearrow







close