XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Using W3C XML Schema

Using W3C XML Schema

October 17, 2001

The W3C XML Schema Definition Language is an XML language for describing and constraining the content of XML documents. W3C XML Schema is a W3C Recommendation.

This article is an introduction to using W3C XML Schemas, and also includes a comprehensive reference to the Schema datatypes and structures.

(Editor's note: this tutorial has been updated since its first publication in 2000, to reflect the finalization of W3C XML Schema as a Recommendation.)

Introducing our First Schema

Let's start by having a look at this simple document which describes a book:

<?xml version="1.0" encoding="UTF-8"?>
<book isbn="0836217462">
 <title>
  Being a Dog Is a Full-Time Job
 </title>
 <author>Charles M. Schulz</author>
 <character>
  <name>Snoopy</name>
  <friend-of>Peppermint Patty</friend-of>
  <since>1950-10-04</since>
  <qualification>
    extroverted beagle
  </qualification>
 </character>
 <character>
  <name>Peppermint Patty</name>
  <since>1966-08-22</since>
  <qualification>bold, brash and tomboyish</qualification>
 </character>
</book>

Get a copy of library1.xml for reference.

To write a schema for this document, we could simply follow its structure and define each element as we find it. To start, we open a xs:schema element:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
     xmlns:xs="http://www.w3.org/2001/XMLSchema">
.../...
</xs:schema>

Table of Contents

Introducing Our First Schema
Slicing the Schema
Defining Named Types
Groups, Compositors and Derivation
Content Types
Constraints
Building Usable and Reusable Schemas
Namespaces
W3C XML Schema and Instance Documents
W3C XML Schema Datatypes Reference
W3C XML Schema Structures Reference

The schema element opens our schema. It can also hold the definition of the target namespace and several default options, of which we will see some of them in the following sections.

To match the start tag for the book element, we define an element named book. This element has attributes and non text children, thus we consider it as a complexType (since the other datatype, simpleType is reserved for datatypes holding only values and no element or attribute sub-nodes. The list of children of the book element is described by a sequence element:

<xs:element name="book">
  <xs:complexType>
    <xs:sequence>
     .../...
    </xs:sequence>
    .../...
  </xs:complexType>
</xs:element>

The sequence is a "compositor" that defines an ordered sequence of sub-elements. We will see the two other compositors, choice and all in the following sections.

Now we can define the title and author elements as simple types -- they don't have attributes or non-text children and can be described directly within a degenerate element element. The type (xs:string) is prefixed by the namespace prefix associated with XML Schema, indicating a predefined XML Schema datatype:

   <xs:element name="title" type="xs:string"/>
   <xs:element name="author" type="xs:string"/>

Now, we must deal with the character element, a complex type. Note how its cardinality is defined:

<xs:element name="character" minOccurs="0" maxOccurs="unbounded">
  <xs:complexType>
    <xs:sequence>
      .../...
    </xs:sequence>
  </xs:complexType>
</xs:element>

Unlike other schema definition languages, W3C XML Schema lets us define the cardinality of an element (i.e. the number of its possible occurrences) with some precision. We can specify both minOccurs (the minimum number of occurences) and maxOccurs (the maximum number of occurrences). Here maxOccurs is set to unbounded which means that there can be as many occurences of the character element as the author wishes. Both attributes have a default value of one.

We specify then the list of all its children in the same way:

<xs:element name="name" type="xs:string"/>
<xs:element name="friend-of" type="xs:string" 
             minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="since" type="xs:date"/>
<xs:element name="qualification" type="xs:string"/>

Related Reading

XML Schema

XML Schema
The W3C's Object-Oriented Descriptions for XML
By Eric van der Vlist

And we terminate its description by closing the complexType , element and sequence elements.

We can now declare the attributes of the document elements, which must always come last. There appears to be no special reason for this, but the W3C XML Schema Working Group has considered that it was simpler to impose a relative order to the definitions of the list of elements and attributes within a complex type, and that it was more natural to define the attributes after the elements.

  <xs:attribute name="isbn" type="xs:string"/>

And close all the remaining elements.

That's it! This first design, sometimes known as "Russian Doll Design" tightly follows the structure of our example document.

One of the key features of such a design is to define each element and attribute within its context and to allow multiple occurrences of a same element name to carry different definitions.

Complete listing of this first example:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="book">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="title" type="xs:string"/>
        <xs:element name="author" type="xs:string"/>
        <xs:element name="character" minOccurs="0" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="name" type="xs:string"/>
              <xs:element name="friend-of" type="xs:string" minOccurs="0"
			       maxOccurs="unbounded"/>
              <xs:element name="since" type="xs:date"/>
              <xs:element name="qualification" type="xs:string"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute name="isbn" type="xs:string"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

Download this schema: library1.xsd

The next section explores how to subdivide schema designs to make them more readable and maintainable.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9

Next Pagearrow







close