Menu

Using W3C XML Schema

October 17, 2001

Eric van der Vlist

The W3C XML Schema Definition Language is an XML language for describing and constraining the content of XML documents. W3C XML Schema is a W3C Recommendation.

This article is an introduction to using W3C XML Schemas, and also includes a comprehensive reference to the Schema datatypes and structures.

(Editor's note: this tutorial has been updated since its first publication in 2000, to reflect the finalization of W3C XML Schema as a Recommendation.)

Introducing our First Schema

Let's start by having a look at this simple document which describes a book:

<?xml version="1.0" encoding="UTF-8"?>
<book isbn="0836217462">
 <title>
  Being a Dog Is a Full-Time Job
 </title>
 <author>Charles M. Schulz</author>
 <character>
  <name>Snoopy</name>
  <friend-of>Peppermint Patty</friend-of>
  <since>1950-10-04</since>
  <qualification>
    extroverted beagle
  </qualification>
 </character>
 <character>
  <name>Peppermint Patty</name>
  <since>1966-08-22</since>
  <qualification>bold, brash and tomboyish</qualification>
 </character>
</book>

Get a copy of library1.xml for reference.

To write a schema for this document, we could simply follow its structure and define each element as we find it. To start, we open a xs:schema element:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
     xmlns:xs="http://www.w3.org/2001/XMLSchema">
.../...
</xs:schema>

Table of Contents

Introducing Our First Schema
Slicing the Schema
Defining Named Types
Groups, Compositors and Derivation
Content Types
Constraints
Building Usable and Reusable Schemas
Namespaces
W3C XML Schema and Instance Documents
W3C XML Schema Datatypes Reference
W3C XML Schema Structures Reference

The schema element opens our schema. It can also hold the definition of the target namespace and several default options, of which we will see some of them in the following sections.

To match the start tag for the book element, we define an element named book. This element has attributes and non text children, thus we consider it as a complexType (since the other datatype, simpleType is reserved for datatypes holding only values and no element or attribute sub-nodes. The list of children of the book element is described by a sequence element:

<xs:element name="book">
  <xs:complexType>
    <xs:sequence>
     .../...
    </xs:sequence>
    .../...
  </xs:complexType>
</xs:element>

The sequence is a "compositor" that defines an ordered sequence of sub-elements. We will see the two other compositors, choice and all in the following sections.

Now we can define the title and author elements as simple types -- they don't have attributes or non-text children and can be described directly within a degenerate element element. The type (xs:string) is prefixed by the namespace prefix associated with XML Schema, indicating a predefined XML Schema datatype:

   <xs:element name="title" type="xs:string"/>
   <xs:element name="author" type="xs:string"/>

Now, we must deal with the character element, a complex type. Note how its cardinality is defined:

<xs:element name="character" minOccurs="0" maxOccurs="unbounded">
  <xs:complexType>
    <xs:sequence>
      .../...
    </xs:sequence>
  </xs:complexType>
</xs:element>

Unlike other schema definition languages, W3C XML Schema lets us define the cardinality of an element (i.e. the number of its possible occurrences) with some precision. We can specify both minOccurs (the minimum number of occurences) and maxOccurs (the maximum number of occurrences). Here maxOccurs is set to unbounded which means that there can be as many occurences of the character element as the author wishes. Both attributes have a default value of one.

We specify then the list of all its children in the same way:

<xs:element name="name" type="xs:string"/>
<xs:element name="friend-of" type="xs:string" 
             minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="since" type="xs:date"/>
<xs:element name="qualification" type="xs:string"/>

And we terminate its description by closing the complexType , element and sequence elements.

We can now declare the attributes of the document elements, which must always come last. There appears to be no special reason for this, but the W3C XML Schema Working Group has considered that it was simpler to impose a relative order to the definitions of the list of elements and attributes within a complex type, and that it was more natural to define the attributes after the elements.

  <xs:attribute name="isbn" type="xs:string"/>

And close all the remaining elements.

That's it! This first design, sometimes known as "Russian Doll Design" tightly follows the structure of our example document.

One of the key features of such a design is to define each element and attribute within its context and to allow multiple occurrences of a same element name to carry different definitions.

Complete listing of this first example:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="book">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="title" type="xs:string"/>
        <xs:element name="author" type="xs:string"/>
        <xs:element name="character" minOccurs="0" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="name" type="xs:string"/>
              <xs:element name="friend-of" type="xs:string" minOccurs="0"
			       maxOccurs="unbounded"/>
              <xs:element name="since" type="xs:date"/>
              <xs:element name="qualification" type="xs:string"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute name="isbn" type="xs:string"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

Download this schema: library1.xsd

The next section explores how to subdivide schema designs to make them more readable and maintainable.

Table of Contents

Introducing Our First Schema
Slicing the Schema
Defining Named Types
Groups, Compositors and Derivation
Content Types
Constraints
Building Usable and Reusable Schemas
Namespaces
W3C XML Schema and Instance Documents
W3C XML Schema Datatypes Reference
W3C XML Schema Structures Reference

Slicing the Schema

While the previous design method is very simple, it can lead to a depth in the embedded definitions, making it hardly readable and difficult to maintain when documents are complex. It also has the drawback of being very different from a DTD structure, an obstacle for human or machine agents wishing to transform DTDs into XML Schemas, or even just use the same design guides for both technologies.

The second design is based on a flat catalog of all the elements available in the instance document and, for each of them, lists of child elements and attributes. This effect is achieved through using references to element and attribute definitions that need to be within the scope of the referencer, leading to a flat design:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <!-- definition of simple type elements -->
  <xs:element name="title" type="xs:string"/>
  <xs:element name="author" type="xs:string"/>
  <xs:element name="name" type="xs:string"/>
  <xs:element name="friend-of" type="xs:string"/>
  <xs:element name="since" type="xs:date"/>
  <xs:element name="qualification" type="xs:string"/>
  <!-- definition of attributes -->
  <xs:attribute name="isbn" type="xs:string"/>
  <!-- definition of complex type elements -->
  <xs:element name="character">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="name"/>
        <xs:element ref="friend-of" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="since"/>
        <xs:element ref="qualification"/>
        <!-- the simple type elements are referenced using
        the "ref" attribute                       -->
        <!-- the definition of the cardinality is done
        when the elements are referenced         -->
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="book">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="title"/>
        <xs:element ref="author"/>
        <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute ref="isbn"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

Download this schema: library2.xsd

Using a reference to an element or an attribute is somewhat comparable to cloning an object. The element or attribute is defined first, and it can be duplicated at another place in the document structure by the reference mechanism, in the same way an object can be cloned. The two elements (or attributes) are then two instances of the same class.

The next section shows how we can define such classes, called "types," that enables us to re-use element definitions.

Table of Contents

Introducing Our First Schema
Slicing the Schema
Defining Named Types
Groups, Compositors and Derivation
Content Types
Constraints
Building Usable and Reusable Schemas
Namespaces
W3C XML Schema and Instance Documents
W3C XML Schema Datatypes Reference
W3C XML Schema Structures Reference

Defining Named Types

We have seen that we can define elements and attributes as we need them (Russian doll design), or create them first and reference them (flat catalog). W3C XML Schema gives us a third mechanism, which is to define data types (either simple types that will be used for PCDATA elements or attributes or complex types that will be used only for elements) and to use these types to define our attributes and elements.

This is achieved by giving a name to the simpleType and complexType elements, and locating them outside of the definition of elements or attributes. We will also take the opportunity to show how we can derive a datatype from another one by defining a restriction over the values of this datatype.

For instance, to define a datatype named nameType, which is a string with a maximum of 32 characters, we will write:

<xs:simpleType name="nameType">
  <xs:restriction base="xs:string">
    <xs:maxLength value="32"/>
  </xs:restriction>
</xs:simpleType>

The simpleType element holds the name of the new datatype. The restriction element expresses the fact that the datatype is derived from the string datatype of the W3C XML Schema namespace (attribute base) by applying a restriction, i.e. by limiting the number of possible values. The maxLength element that, called a facet, says that this restriction is a condition on the maximum length to be 32 characters.

Another powerful facet is the pattern element, which defines a regular expression that must be matched. For instance, if we do not care about the "-" signs, we can define an ISBN datatype as 10 digits thus:

<xs:simpleType name="isbnType">
  <xs:restriction base="xs:string">
    <xs:pattern value="[0-9]{10}"/>
  </xs:restriction>
</xs:simpleType>

Facets, and the two other ways to derive a datatype (list and union) are covered in the next sections.

Complex types are defined as we've seen before, but given a name.

Full listing:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <!-- definition of simple types -->
  <xs:simpleType name="nameType">
    <xs:restriction base="xs:string">
      <xs:maxLength value="32"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="sinceType">
    <xs:restriction base="xs:date"/>
  </xs:simpleType>
  <xs:simpleType name="descType">
    <xs:restriction base="xs:string"/>
  </xs:simpleType>
  <xs:simpleType name="isbnType">
    <xs:restriction base="xs:string">
      <xs:pattern value="[0-9]{10}"/>
    </xs:restriction>
  </xs:simpleType>
  <!-- definition of complex types -->
  <xs:complexType name="characterType">
    <xs:sequence>
      <xs:element name="name" type="nameType"/>
      <xs:element name="friend-of" type="nameType" minOccurs="0"
	       maxOccurs="unbounded"/>
      <xs:element name="since" type="sinceType"/>
      <xs:element name="qualification" type="descType"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="bookType">
    <xs:sequence>
      <xs:element name="title" type="nameType"/>
      <xs:element name="author" type="nameType"/>
      <xs:element name="character" type="characterType" minOccurs="0"/>
      <!-- the definition of the "character" element is
        using the "characterType" complex type    -->
    </xs:sequence>
    <xs:attribute name="isbn" type="isbnType" use="required"/>
  </xs:complexType>
  <!-- Reference to "bookType" to define the
     "book" element -->
  <xs:element name="book" type="bookType"/>
</xs:schema>

Download this schema: library3.xsd

The next page shows how grouping, compositors and derivation can be used to further promote re-use and structure in schemas.

Table of Contents

Introducing Our First Schema
Slicing the Schema
Defining Named Types
Groups, Compositors and Derivation
Content Types
Constraints
Building Usable and Reusable Schemas
Namespaces
W3C XML Schema and Instance Documents
W3C XML Schema Datatypes Reference
W3C XML Schema Structures Reference

Groups, Compositors and Derivation

Groups

W3C XML Schema also allows the definition of groups of elements and attributes.

<!-- definition of an element group -->

  <xs:group name="mainBookElements">
    <xs:sequence>
      <xs:element name="title" type="nameType"/>
      <xs:element name="author" type="nameType"/>
    </xs:sequence>
  </xs:group>

  <!-- definition of an attribute group -->
  <xs:attributeGroup name="bookAttributes">
    <xs:attribute name="isbn" type="isbnType" use="required"/>
    <xs:attribute name="available" type="xs:string"/>
  </xs:attributeGroup>

These groups can be used in the definition of complex types, as shown below.

  <xs:complexType name="bookType">
    <xs:sequence>
      <xs:group ref="mainBookElements"/>
      <xs:element name="character" type="characterType" 
           minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attributeGroup ref="bookAttributes"/>
  </xs:complexType>

These groups are not datatypes but containers holding a set of elements or attributes that can be used to describe complex types.

Compositors

So far, we have seen the xs:sequence compositor which defines ordered groups of elements (in fact, it defines ordered group of particles, which can also be groups or other compositors). W3C XML Schema supports two additional compositors that can be mixed to allow various combinations. Each of these compositors can have minOccurs and maxOccurs attributes to define their cardinality.

The xs:choice compositor describes a choice between several possible elements or groups of elements. The following group --compositors can appear within groups, complex types or other compositors-- will accept either a single name element or a sequence of firstName, an optional middleName and a lastName:

  <xs:group name="nameTypes">
    <xs:choice>
      <xs:element name="name" type="xs:string"/>
      <xs:sequence>
        <xs:element name="firstName" type="xs:string"/>
        <xs:element name="middleName" type="xs:string" minOccurs="0"/>
        <xs:element name="lastName" type="xs:string"/>
      </xs:sequence>
    </xs:choice>
  </xs:group>

The xs:all compositor defines an unordered set of elements. The following complex type definition allows its contained elements to appear in any order:

  <xs:complexType name="bookType">
    <xs:all>
      <xs:element name="title" type="xs:string"/>
      <xs:element name="author" type="xs:string"/>
      <xs:element name="character" type="characterType" minOccurs="0"
	       maxOccurs="unbounded"/>
    </xs:all>
    <xs:attribute name="isbn" type="isbnType" use="required"/>
  </xs:complexType>

In order to avoid combinations that could become ambiguous or too complex to be solved by W3C XML Schema tools, a set of restrictions has been added to the xs:all particle:

  • they can appear only as a unique child at the top of a content model
  • and their children can be only xs:element definitions or references and cannot have a cardinality greater than one.

Derivation of simple types

Simple datatypes are defined by derivation of other datatypes, either predefined and identified by the W3C XML Schema namespace or defined elsewhere in your schema.

We have already seen examples of simple types derived by restriction (using xs:restriction elements). The different kind of restrictions that can be applied on a datatype are called facets. Beyond the xs:pattern (using a regular expression syntax) and xs:maxLength facets shown already, many facets allow constraints on the length of a value, an enumeration of the possible values, the minimal and maximal values, its precision and scale, etc.

Two other derivation methods are available that allow to define white space separated lists and union of datatypes. The following definition uses xs:union extends the definition of our type for isbn to accept the values TDB and NA:

  <xs:simpleType name="isbnType">
    <xs:union>
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:pattern value="[0-9]{10}"/>
        </xs:restriction>
      </xs:simpleType>
      <xs:simpleType>
        <xs:restriction base="xs:NMTOKEN">
          <xs:enumeration value="TBD"/>
          <xs:enumeration value="NA"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:union>
  </xs:simpleType>

The union has been applied on the two embedded simple types to allow values from both datatypes, our new datatype will now accept the values from an enumeration with two possible values (TBD and NA).

The following example type (isbnTypes) uses xs:list to define a whitespace-separated list of ISBN values. It also derives a type (isbnTypes10) using xs:restriction that accept between 1 and 10 values, separated by a whitespace:

  <xs:simpleType name="isbnTypes">
    <xs:list itemType="isbnType"/>
  </xs:simpleType>
  <xs:simpleType name="isbnTypes10">
    <xs:restriction base="isbnTypes">
      <xs:minLength value="1"/>
      <xs:maxLength value="10"/>
    </xs:restriction>
  </xs:simpleType>

Table of Contents

Introducing Our First Schema
Slicing the Schema
Defining Named Types
Groups, Compositors and Derivation
Content Types
Constraints
Building Usable and Reusable Schemas
Namespaces
W3C XML Schema and Instance Documents
W3C XML Schema Datatypes Reference
W3C XML Schema Structures Reference

Content Types

In the first part of this article, we examined the default content type behavior, modeled after data-oriented documents, where complex type elements are element and attribute only and simple type elements are character data without attributes.

The W3C XML Schema Definition Language also supports defining empty content elements and simple content (those that contain only character data) with attributes.

Empty content elements are defined using a regular xs:complexType construct and purposefully omitting to define a child element. The following construct defines an empty book element accepting an isbn attribute:

  <xs:element name="book">
    <xs:complexType>
      <xs:attribute name="isbn" type="isbnType"/>
    </xs:complexType>
  </xs:element>

Simple content elements, i.e. character data elements with attributes, can be derived from simple types using xs:simpleContent. The book element defined above can thus be extended to accept a text value by:

  <xs:element name="book">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:string">
          <xs:attribute name="isbn" type="isbnType"/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>

Note the location of the attribute definition, showing that the extension is done through the addition of the attribute. This definition will accept the following XML element:

<book isbn="0836217462">
	Funny book by Charles M. Schulz.
	Its title (Being a Dog Is a Full-Time Job) says it all !
</book>

W3C XML Schema supports mixed content though the mixed attribute in the xs:complexType elements. Consider

<xs:element name="book">
 <xs:complexType mixed="true">
  <xs:all>
   <xs:element name="title" type="xs:string"/>
   <xs:element name="author" type="xs:string"/>
  </xs:all>
  <xs:attribute name="isbn" type="xs:string"/>
 </xs:complexType>
</xs:element>

which will validate an XML element such as:

<book isbn="0836217462">
	Funny book by <author>Charles M. Schulz</author>.
	Its title (<title>Being a Dog Is a Full-Time Job</title>) says it all !
</book>

Unlike DTDs, W3C XML Schema mixed content doesn't modify the constraints on the sub-elements, which can be expressed in the same way as simple content models. While this is a significant improvement over XML 1.0 DTDs, note that the values of the character data and its location relative to the child elements, cannot be constrained.

Table of Contents

Introducing Our First Schema
Slicing the Schema
Defining Named Types
Groups, Compositors and Derivation
Content Types
Constraints
Building Usable and Reusable Schemas
Namespaces
W3C XML Schema and Instance Documents
W3C XML Schema Datatypes Reference
W3C XML Schema Structures Reference

Constraints

Unique

W3C XML Schema provides several flexible XPath-based features for describing uniqueness constraints and corresponding references constraints. The first of these, a simple uniqueness declaration, is declared with the xs:unique element. The following declaration done under the declaration of our book element indicates that the character name must be unique:

    <xs:unique name="charName">
      <xs:selector xpath="character"/>
      <xs:field xpath="name"/>
    </xs:unique>

This location of the xs:unique element in the schema gives the context node in which the constraint holds. By inserting xs:unique under our book element, we specify that the character has to be unique within the context of this book only.

The two XPaths defined in the uniqueness constraint are evaluated relative to the context node. The first of these paths is defined by the selector element. The purpose is to define the element which has the uniqueness constraint -- the node to which the selector points must be an element node.

The second path, specified in the xs:field element is evaluated relative to the element identified by the xs:selector, and can be an element or an attribute node. This is the node whose value will be checked for uniqueness. Combinations of values can be specified by adding other xs:field elements within xs:unique.

Keys

The second construct, xs:key, is similar to xs:unique except that the value has to be non null (note that xs:unique and xs:key can both be referenced). To use the character name as a key, we can just replace the xs:unique by xs:key:

<xs:key name="charName">
  <xs:selector xpath="character"/>
  <xs:field xpath="name"/>
</xs:key>

Keyref

The third construct, xs:keyref, allows us to define a reference to a xs:key or a xs:unique. To show its usage, we will introduce the friend-of element, to be used against characters:

<character>
  <name>Snoopy</name>
  <friend-of>Peppermint Patty</friend-of>
  <since>1950-10-04</since>
  <qualification>
    extroverted beagle
  </qualification>
</character>

To indicate that friend-of needs to refer to a character from this same book, we will write, at the same level as we defined our key constraint, the following:

<xs:keyref name="charNameRef" refer="charName">
  <xs:selector xpath="character"/>
  <xs:field xpath="friend-of"/>
</xs:keyref>

These capabilities are almost independent of the other features in a schema. They are disconnected from the definition of the datatypes. The only point anchoring them to the schema is the place where they are defined, which establishes the scope of the uniqueness constraints.

Table of Contents

Introducing Our First Schema
Slicing the Schema
Defining Named Types
Groups, Compositors and Derivation
Content Types
Constraints
Building Usable and Reusable Schemas
Namespaces
W3C XML Schema and Instance Documents
W3C XML Schema Datatypes Reference
W3C XML Schema Structures Reference

Building Usable -- and Reusable -- Schemas

Perhaps the first step in writing reusable schemas is to document them. W3C XML Schema provides an alternative to XML comments (for humans) and processing instructions (for machines) that might be easier to handle for supporting tools.

Human readable documentation can be defined by xs:documentation elements, while information targeted to applications should be included in xs:appinfo elements. Both elements need to be included in an xs:annotation element and accept optional xml:lang and source attributes and any content type. The source attribute is a URI reference that can be used to indicate the purpose of the comment documentation or application information.

The xs:annotation elements can be added at the beginning of most schema constructions, as shown in the example below. The appinfo section demonstrates how custome namespaces and schemes might allow the binding of an element to a Java class from within the schema.

<xs:element name="book">
 <xs:annotation>
 	<xs:documentation xml:lang="en">
		Top level element.
 	</xs:documentation>
 	<xs:documentation xml:lang="fr">
		Element racine.
 	</xs:documentation>
	<xs:appinfo source="http://example.com/foo/">
		<bind xmlns="http://example.com/bar/">
			<class name="Book"/>
		</bind>
	</xs:appinfo>
 </xs:annotation>

Composing schemas from multiple files

For those who want to define a schema using several XML documents -- either to split a large schema, or to use libraries of schema snippets -- W3C XML Schema provides two mechanisms for including external schemas.

The first one, xs:include, is similar to a copy and paste of the definitions of the included schema: it's an inclusion and as such it doesn't allow to override the definitions of the included schema. It can be used this way:

<xs:include schemaLocation="character.xsd"/>

The second inclusion mechanism, xs:redefine, is similar to xs:include, except that it lets you redefine declarations from the included schema.

<xs:redefine schemaLocation="character12.xsd">
  <xs:simpleType name="nameType">
    <xs:restriction base="xs:string">
      <xs:maxLength value="40"/>
    </xs:restriction>
  </xs:simpleType>
</xs:redefine>

Note that the declarations that are redefined must be placed in the xs:redefine element.

We've already seen many features that can be used together with xs:include and xs:redefine to create libraries of schemas. We've seen how we can reference previously defined elements, how we can define datatypes by derivation and use them, how we can define and use groups of attributes. We've also seen the parallel between elements and objects, and datatypes and classes. There are other features borrowed from object oriented designs that can be used to create reusable schemas.

Abstract types

The first of these features derived from object oriented design is the substitution group. Unlike the features we've seen so far, a substitution group is not defined explicitly through a W3C XML Schema element, but through referencing a common element (called the head) using a substitutionGroup attribute.

The head element doesn't hold any specific declaration but must be global. All the elements within a substitution group need to have a type that is either the same type as the head element or can be derived from it. Then they can all be used in place of the head element. In the following example, the element surname can be used anywhere an element name has been defined.

<xs:element name="name" type="xs:string"/>
<xs:element name="surname" type="xs:string" 
	substitutionGroup="name" />

Now, we can also define a generic name-elt element, head of a substitution group, that shouldn't be used directly, but in one of its derived forms. This is done through declaring the element as abstract, analogous to abstract classes in object oriented languages. The following example defines name-elt as an abstract element that should be replaced either by name or surname everywhere it is referenced.

<xs:element name="name-elt" type="xs:string" abstract="true"/>
<xs:element name="name" type="xs:string" 
	substitutionGroup="name-elt"/>
<xs:element name="surname" type="xs:string" 
	substitutionGroup="name-elt"/>

Final types

We could, on the other hand, wish to control derivation performed on a datatype. W3C XML Schema supports this through the final attribute in a xs:complexType, xs:simpleType or xs:element element. This attribute can take the values restriction, extension and #all to block derivation by restriction, extension or any derivation. The following snippet would, for instance, forbid any derivation of the characterType complex type.

<xs:complexType name="characterType" final="#all">
  <xs:sequence>
    <xs:element name="name" type="nameType"/>
    <xs:element name="since" type="sinceType"/>
    <xs:element name="qualification" type="descType"/>
  </xs:sequence>
</xs:complexType>

In addition to final, a more fine-grained mechanism is provided to control the derivation of simple types that operate on each facet. Here, the attribute is called fixed, and when its value is set to true, the facet cannot be further modified (but other facets can still be added or modified). The following example prevents the size of our nameType simple type to be redefined:

  <xs:simpleType name="nameType">
    <xs:restriction base="xs:string">
      <xs:maxLength value="32" fixed="true"/>
    </xs:restriction>
  </xs:simpleType>

Table of Contents

Introducing Our First Schema
Slicing the Schema
Defining Named Types
Groups, Compositors and Derivation
Content Types
Constraints
Building Usable and Reusable Schemas
Namespaces
W3C XML Schema and Instance Documents
W3C XML Schema Datatypes Reference
W3C XML Schema Structures Reference

Namespaces

Namespaces support in W3C XML Schema is flexible yet straightforward. It not only allows the use of any prefix in instance documents (unlike DTDs) but also lets you open your schemas to accept unknown elements and attributes from known or unknown namespaces.

Each W3C XML Schema document is bound to a specific namespace through the targetNamespace attribute, or to the absence of namespace through the lack of such an attribute. We need at least one schema document per namespace we want to define (elements and attributes without namespaces can be defined in any schema, though).

Until now we have omitted the targetNamespac attribute, which means that we were working without namespaces. To get into namespaces, let's first imagine that our example belongs to a single namespace:

<book isbn="0836217462" xmlns="http://example.org/ns/books/">
 .../...
</book>

The least intrusive way to adapt our schema is to add some more attributes to our xs:schema element.

<xs:schema targetNamespace="http://example.org/ns/books/" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    xmlns:bk="http://example.org/ns/books/" 
    elementFormDefault="qualified"
    attributeFormDefault="unqualified">
    .../...
</xs:schema>

The namespace declarations play an important role. The first one (xmlns:xs="http://www.w3.org/2001/XMLSchema") says not only that we've chosen to use the prefix xs to identify the elements that will be W3C XML Schema instructions, but also that we will prefix the W3C XML Schema predefined datatypes with xs as we have done all over the examples thus far. Understand that we could have chosen any prefix instead of xs. We could even make http://www.w3.org/2001/XMLSchema our default namespace and in this case, we wouldn't have prefixed the W3C XML Schema elements nor its datatypes.

Since we are working with the http://example.org/ns/books/ namespace, we define it (with a bk prefix). This means that we will now prefix the references to "objects" (datatypes, elements, attributes, ...) belonging to this namespace with bk:. Again, we could have chosen any prefix to identify this namespace or even have made it our default namespaces (note that the XPath expressions used in xs:unique, xs:key and xs:keyref do not use a default namespace, though).

The targetNamespace attribute lets you define, independently of the namespace declarations, which namespace is described in this schema. If you need to reference objects belonging to this namespace, which is usually the case except when using a pure "Russian doll" design, you need to provide a namespace declaration in addition to the targetNamespace.

The final two attributes (elementFormDefault and attributeFormDefault) are a facility provided by W3C XML Schema to control, within a single schema, whether attributes and elements are considered by default to be qualified (in a namespace). This differentiation between qualified and unqualified can be indicated by specifying the default values, as above, but also when defining the elements and attributes, by adding a form attribute of value qualified or unqualified.

It is important to note that only local elements and attributes can be specified as unqualified. All globally defined elements and attributes must always be qualified.

Importing definitions from external namespaces

W3C XML Schema, not unlike XSLT and XPath, uses namespace prefixes within the value of some attributes to identify the namespace of data types, elements, attributes, atc. For instance, we've used this feature all along our examples to identify the W3C XML Schema predefined datatypes. This mechanism can be extended to import definitions from any other namespace and so reuse them in our schemas.

Reusing definitions from other namespaces is done through a three-step process. This process needs to be done even for the XML 1.0 namespace, in order to declare attributes such as xml:lang. First, the namespace must be defined as usual.

<xs:schema targetNamespace="http://example.org/ns/books/" 
  xmlns:xml="http://www.w3.org/XML/1998/namespace" 
  xmlns:bk="http://example.org/ns/books/" 
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  elementFormDefault="qualified"
  attributeFormDefault="qualified">
  .../...
</xs:schema>

Then W3C XML Schema needs to be informed of the location at which it can find the schema corresponding to the namespace. This is done using an xs:import element.

<xs:import namespace="http://www.w3.org/XML/1998/namespace"
  schemaLocation="myxml.xsd"/>

W3C XML Schema now knows that it should attempt to find any reference belonging to the XML namespace in a schema located at myxml.xsd. We can now use the external definition.

   <xs:element name="title">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:string">
          <xs:attribute ref="xml:lang"/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>

You may wonder why we have chosen to reference the xml:lang attribute from the XML namespace, rather than creating an attribute with a type xml:lang. We've done so because there is an important difference between referencing an attribute (or an element) and referencing a datatype when namespaces are concerned:

  • Referencing an element or an attribute imports the whole thing with its name and namespace,
  • Referencing a datatype imports only its definition, leaving you with the task of giving a name to the element and attribute you're defining and using the target namespace (or no namespace if your attribute or element is unqualified).

Including unknown elements

To finish this section about namespaces, we need to see how, as promised in our introduction, we can open our schema to unknown elements, attributes and namespaces. This is done using xs:any and xs:anyAttribute, allowing, respectivly, to include any elements or attributes.

For instance, if we want to extend the definition of our description type to any XHTML tag, we could declare:

<xs:complexType name="descType" mixed="true">
  <xs:sequence>
    <xs:any namespace="http://www.w3.org/1999/xhtml" 
	  processContents="skip" minOccurs="0"
	  maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

The xs:anyAttribute gives the same functionality for attributes.

The type descType is now mixed content and accepts an unbounded number of any element from the http://www.w3.org/1999/xhtml namespace. The processContents attribute is set to skip telling a W3C XML Schema processor that no validation of these elements should be attempted. The other permissible values could are strict asking to validate these elements or lax asking to validate them when possible. The namespace attribute accepts a whitespace-separated list of URIs and the special values ##local (non qualified elements) and ##targetNamespace (the target namespace) that can be included in the list and ##other (any namespace other than the target) or ##any (any namespace) that can replace the list. It is not possible to specify any namespace except those from a list.

Table of Contents

Introducing Our First Schema
Slicing the Schema
Defining Named Types
Groups, Compositors and Derivation
Content Types
Constraints
Building Usable and Reusable Schemas
Namespaces
W3C XML Schema and Instance Documents
W3C XML Schema Datatypes Reference
W3C XML Schema Structures Reference

W3C XML Schema and Instance Documents

We've now covered most of the features of W3C XML Schema, but we still need to have a glance on some extensions that you can use within your instance documents. In order to differentiate these other features, a separate namespace, http://www.w3.org/2001/XMLSchema-instance, usually associated with the prefix xsi.

The xsi:noNamespaceSchemaLocation and xsi:schemaLocation attributes allow you to tie a document to its W3C XML Schema. This link is not mandatory, and other indications can be given at validation time, but it does help W3C XML Schema-aware tools to locate a schema.

Dependent on using namespaces, the link will be either

<book isbn="0836217462"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="file:library.xsd">

Or, as below (noting the syntax with a URI for the namespace and the URI of the schema, separated by whitespace in the same attribute):

<book isbn="0836217462" 
	xmlns="http://example.org/ns/books/"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation= 
	    "http://example.org/ns/books/ file:library.xsd">

The other use of xsi attributes is to provide information about how an element corresponds to a schema.These attributes are xsi:type, which lets you define the simple or complex type of an element and xsi:nil, which lets you specify a nil (null) value for an element (that has to be defined as nillable in the schema using a nillable=true attribute). You don't need to declare these attributes in your W3C XML Schema to be able to use them in an instance document.