Using W3C XML Schema - Part 2
December 13, 2000
Advanced W3C XML Schema
This is the second part of our comphrensive tutorial and reference on W3C XML Schemas. If you have not already read the first installment of Using XML Schemas, we advise you to do so before reading this article.
Table of Contents
In the first part of this series we examined the default content type behavior, modeled after data-oriented documents, where complex type elements are element and attribute only, and simple type elements are character data without attributes.
The W3C XML Schema Definition Language also supports the definition of empty content elements, and simple content elements (those that contain only character data) with attributes.
Empty content elements are defined using a regular
and by purposefully omitting the definition of a child element. The following construct
defines an empty book element accepting an isbn attribute:
<xsd:element name="book"> <xsd:complexType> <xsd:attribute name="isbn" type="isbnType"/> </xsd:complexType> </xsd:element>
Simple content elements, i.e. character data elements with attributes, can be derived
simple types using
xsd:simpleContent. The book element defined above can thus
be extended to accept a text value using:
<xsd:element name="book"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="isbn" type="isbnType"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element>
Note the location of the attribute definition, showing that the extension is achieved through the addition of the attribute. This definition will accept the following XML element:
<book isbn="0836217462"> Funny book by Charles M. Schulz. Its title (Being a Dog Is a Full-Time Job) says it all ! </book>
W3C XML Schema supports mixed content though the mixed attribute in the
xsd:complexType element. Consider
<xsd:element name="book"> <xsd:complexType mixed="true"> <xsd:all> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> </xsd:all> <xsd:attribute name="isbn" type="xsd:string"/> </xsd:complexType> </xsd:element>
which will validate an XML element such as
<book isbn="0836217462"> Funny book by <author>Charles M. Schulz</author>. Its title (<title>Being a Dog Is a Full-Time Job</title>) says it all ! </book>
Unlike DTDs, W3C XML Schema mixed content doesn't modify the constraints on the sub-elements, which can be expressed in the same way as simple content models. While this is a significant improvement over XML 1.0 DTDs, note that the values of the character data, and its location relative to the child elements, cannot be constrained.
Table of Contents
W3C XML Schema provides several flexible XPath-based features for describing uniqueness
constraints and corresponding references constraints. The first of these, a simple
uniqueness declaration, is declared with the
xsd:unique element. The following
declaration, within the context of our book document, indicates that the character
name must be unique.
<xsd:unique name="charNameMustBeUnique"> <xsd:selector xpath="character"/> <xsd:field xpath="name"/> </xsd:unique>
This location of the
xsd:unique element in the schema gives the context node
in which the constraint holds. By inserting
xsd:unique under our book
element, we specify that the character has to be unique in the context of a book only.
The two XPaths defined in the uniqueness constraint are evaluated relative to the
node. The first of these paths is defined by the
selector element. The purpose
is to define the element which has the uniqueness constraint -- the node to which
selector points must be an element node.
The second path, specified in the
xsd:field element. is evaluated relative to
the element identified by the
xsd:selector and can be an element or an
attribute node. This is the node whose value will be checked for uniqueness. Uniqueness
a combination of several values can be specified by adding other
The second constraint construct,
xsd:key, is similar to
xsd:unique, except that the value specified as unique can be used as a
key. This means that it has to be non-null, and that it can be referenced. To use
the character name as a key, we can replace the
<xsd:key name="charNameIsKey"> <xsd:selector xpath="character"/> <xsd:field xpath="name"/> </xsd:key>
The third construct,
xsd:keyref, allows us to define a reference to a key. To
show its usage, we introduce the
friend-of element, to be used against
<character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification> </character>
To indicate that
friend-of needs to refer to a character from the same book,
we write, at the same level as we defined our key constraint, the following:
<xsd:keyref name="friendOfIsCharRef" refer="charNameIsKey"> <xsd:selector xpath="character"/> <xsd:field xpath="friend-of"/> </xsd:keyref>
These capabilities are nearly independent of the other features in a schema. They are disconnected from the definition of the datatypes. The only point anchoring them to the schema is the place where they are defined, which establishes the scope of the uniqueness constraints.
Table of Contents
Perhaps the first step in writing reusable schemas is to document them. W3C XML Schema provides an alternative to XML comments and processing instructions that might be easier to handle for supporting tools.
Human readable documentation can be defined by
while information targeted at applications should be included in
elements. Both elements must be included in an
xsd:annotation element. They
source attributes. The
source attribute is a URI reference that can be used to indicate the purpose
of the appinfo to the processing application.
xsd:annotation elements can be added at the beginning of most schema
constructs as shown in example below. The appinfo section demonstrates how custom
and schemes might allow the binding of an element to a Java class from within the
<xsd:element name="book"> <xsd:annotation> <xsd:documentation xml:lang="en"> Top level element. </xsd:documentation> <xsd:documentation xml:lang="fr"> Element racine. </xsd:documentation> <xsd:appinfo source="http://example.com/foo/"> <bind xmlns="http://example.com/bar/"> <class name="Book"/> </bind> </xsd:appinfo> </xsd:annotation> ...
Composing schemas from multiple files
For those who want to define a schema using several XML documents -- either to split up a large schema or to use libraries of schema snippets -- W3C XML Schema provides two mechanisms for including external schemas.
xsd:include, is similar to a copy and paste of the definitions of
the included schema: it's an inclusion, and as such it doesn't allow any overriding
definitions of the included schema. It can be used in this way:
The second inclusion mechanism,
xsd:redefine, is similar to
xsd:include, except that it lets you redefine the declarations from the
<xsd:redefine schemaLocation="character12.xsd"> <xsd:simpleType name="nameType"> <xsd:restriction base="xsd:string"> <xsd:maxLength value="40"/> </xsd:restriction> </xsd:simpleType> </xsd:redefine>
Note that the declarations that are redefined must be placed in the
We've already seen many features that can be used together with
xsd:redefine to create libraries of schemas. We've seen how we can
reference previously defined elements; how we can define datatypes by derivation and
them; and how we can define and use groups of attributes. We've also seen the parallel
between elements and objects and datatypes and classes. There are other features borrowed
from object oriented design that can be used to create reusable schemas.
The first feature derived from object oriented design is the substitution group. Unlike
features we've seen so far, a substitution group isn't defined explicitly through
a W3C XML
Schema element but through referencing a common element (called the head), using a
substitutionGroup attribute. The head element doesn't hold any specific
declaration but must be global. All the elements within a substitution group need
to have a
type that is either the same type as the head element, or can be derived from it.
can all be used in place of the head element. In the following example the element
can be used anywhere an element "name" has been defined.
<xsd:element name="name" type="xsd:string"/> <xsd:element name="surname" type="xsd:string" substitutionGroup="name" />
Now we can also define a generic "name-elt" element, head of a substitution group,
couldn't be used directly but should be used in one of its derived forms. This is
through declaring the element as abstract, analagously to abstract classes in object
oriented languages. The following example defines
name-elt as an abstract
element that should be replaced by either name or surname everywhere it is referenced.
<xsd:element name="name-elt" type="xsd:string" abstract="true"/> <xsd:element name="name" type="xsd:string" substitutionGroup="name-elt" /> <xsd:element name="surname" type="xsd:string" substitutionGroup="name-elt" />
We could, on the other hand, wish to control derivation performed on a datatype. W3C
Schema supports this though the
final attribute in an
xsd:element element. This attribute can take
the values restriction, extension and #all to block derivation by
restriction, extension or any derivation. The following snippet would, for instance,
any derivation of the characterType complex type.
<xsd:complexType name="characterType" final="#all">
final attribute can operate only on elements and complex types. W3C XML
Schema provides a fine-grained mechanism that operates on each facet to control the
derivation of simple types. This attribute is called
fixed, and when its value
is set to true, the facet cannot be further modified (but other facets can still be
added or modified). The following prevents the size of our nameType simple type from
<xsd:simpleType name="nameType"> <xsd:restriction base="xsd:string"> <xsd:maxLength value="32" fixed="true"/> </xsd:restriction> </xsd:simpleType>
Table of Contents
Namespace support in W3C XML Schema is flexible yet straightforward. It not only allows the use of any prefix in instance documents (unlike DTDs), but also lets you open your schemas to accept unknown elements and attributes from known or unknown namespaces.
Each W3C XML Schema document is bound to a specific namespace through the
targetNamespace attribute or to the absence of namespace through the lack of
such an attribute. We need at least one schema document per namespace we want to define
(elements and attributes without namespaces can be defined in any schema, though).
Until now we have omitted the
targetNamespace attribute, which means that we
were working without namespaces. To get into namespaces, let's imagine that our example
belongs to a single namespace.
<book isbn="0836217462" xmlns="http://example.org/ns/books/">
The least intrusive way to adapt our schema is to add more attributes to our
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" xmlns="http://example.org/ns/books/" targetNamespace="http://example.org/ns/books/" elementFormDefault="qualified" attributeFormDefault="unqualified" >
The namespace declarations play an important role. The first
xmlns:xsd="http://www.w3.org/2000/10/XMLSchema") says not only that we've
chosen to use the prefix
xsd to identify the elements that will be W3C XML
Schema instructions, but also that we will prefix the W3C XML Schema predefined datatypes
xsd, as we have done in all our examples thus far. Understand that we
could have chosen any prefix instead of
xsd. We could even make
http://www.w3.org/2000/10/XMLSchema our default namespace. In this case, we would
not have prefixed the W3C XML Schema elements.
Since we are working with the http://example.org/ns/books/ namespace, we define it as our default namespace. This means that we won't prefix the references to objects (datatypes, elements, attributes, etc.) belonging to this namespace. Again we could have chosen any prefix to identify this namespace.
targetNamespace attribute lets you define, independently of the namespace
declarations, which namespace is described in this schema. If you need to reference
belonging to this namespace, which is usually the case except when using a pure Russian
design, you need to provide a namespace declaration in addition to the
The final two attributes in the example, (
attributeFormDefault), are a facility provided by W3C XML Schema to control,
within a single schema, whether attributes and elements are considered by default
qualified (in a namespace). This differentiation between qualified and unqualified
indicated by specifying the default values, as above, but also when defining the element
attribute, by adding a
form attribute of value qualified or
It is important to note that only local elements and attributes can be specified as unqualified. All globally defined elements and attributes must always be qualified.
Importing definitions from external namespaces
W3C XML Schema, not unlike XSLT and XPath, uses namespace prefixes within the value of some attributes to identify the namespace of data types, elements, attributes, etc. For instance, we've used this feature all along in our examples to identify the W3C XML Schema predefined datatypes. This mechanism can be extended to import definitions from any other namespace and so reuse them in our schemas.
Reusing definitions from other namespaces is done through a three-step process. This
process needs to be done even for the XML 1.0 namespace in order to declare attributes
xml:lang. First, the namespace must be defined as usual.
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://example.org/ns/books/" xmlns:xml="http://www.w3.org/XML/1998/namespace" elementFormDefault="qualified" >
Then W3C XML Schema needs to be informed of the location at which it can find the
corresponding to the namespace. This is done using an
<xsd:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="myxml.xsd"/>
W3C XML Schema now knows that it should attempt to find any reference belonging to the XML namespace in a schema located at myxml.xsd. We can now use the external definition.
<xsd:element name="title"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute ref="xml:lang"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element>
You may wonder why we've chosen to reference the
xml:lang attribute from the
XML namespace rather than creating an attribute with a type
done so because there is an important difference between referencing an attribute
element) and referencing a datatype when namespaces are concerned.
- Referencing an element or an attribute imports the whole thing with its name and namespace.
- Referencing a datatype imports only its definition, leaving you with the task of giving a name to the element or attribute you're defining, and places your definition in the target namespace (or no namespace if your attribute or element is unqualified).
Including unknown elements
To finish this section about namespaces, we need to see how, as promised in the
introduction, we can open our schema to unknown elements, attributes and namespaces.
xsd:anyAttribute, allowing, respectively,
the inclusion of any element or attribute.
For instance, if we want to extend the definition of our description type to any XHTML tag, we could declare
<xsd:complexType name="descType" mixed="true"> <xsd:sequence> <xsd:any namespace="http://www.w3.org/1999/xhtml" minOccurs="0" maxOccurs="unbounded" processContents="skip"/> </xsd:sequence> </xsd:complexType>
xsd:anyAttribute gives the same functionality for attribute
The type descType is now mixed content and accepts an unbounded number of any
elements from the http://www.w3.org/1999/xhtml namespace. The
processContents attribute is set to skip, telling a W3C XML Schema
processor that no validation of these elements should be attempted. The other permissible
values for this attribute are strict, asking to validate these elements, or
lax, asking the processor to validate them when possible. The
namespace attribute accepts a whitespace-separated list of URIs, as well as
the special values ##any (any namespace), ##local (non-qualified elements),
##targetNamespace (the target namespace) or ##other (any namespace other
than the target).
Table of Contents
We've now covered most of the features of W3C XML Schema, but we still need to have
glance at some extensions that you can use within your instance documents. In order
differentiate these other features, a separate namespace,
http://www.w3.org/2000/10/XMLSchema-instance, is used, usually associated with the
attributes allow you to tie a document to its W3C XML Schema. This link is not
mandatory, and other indications can be given using application-dependent mechanisms
as a parameter on a command line), but it does help W3C XML Schema aware tools to
Dependent on using namespaces, the link will be either
<book isbn="0836217462" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="file:library.xsd">
Or, as below (noting the syntax, with a URI for the namespace and the URI of the schema separated by a whitespace in the same attribute)
<book isbn="0836217462" xmlns="http://example.org/ns/books/" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://example.org/ns/books/ file:library.xsd">
The other use of
xsi attributes is to provide information about how an element
corresponds to a schema. These attributes are
xsi:type, which lets you define
the simple or complex type of an element, and
xsi:null, which lets you specify
a null value for an element (that has to be defined as
nullable="true" in the
schema). You don't need to declare these attributes in your schema to be able to use
an instance document.