Using W3C XML Schema - Part 2
This is the second part of our comphrensive tutorial and reference on W3C XML Schemas. If you have not already read the first installment of Using XML Schemas, we advise you to do so before reading this article.
|
Table of Contents |
|
Content Types |
In the first part of this series we examined the default content type behavior, modeled after data-oriented documents, where complex type elements are element and attribute only, and simple type elements are character data without attributes.
The W3C XML Schema Definition Language also supports the definition of empty content elements, and simple content elements (those that contain only character data) with attributes.
Empty content elements are defined using a regular
xsd:complexType construct and by purposefully omitting the
definition of a child element. The following construct defines an empty
book element accepting an isbn attribute:
<xsd:element name="book"> <xsd:complexType> <xsd:attribute name="isbn" type="isbnType"/> </xsd:complexType> </xsd:element>
Simple content elements, i.e. character data elements with attributes,
can be derived from simple types using xsd:simpleContent. The
book element defined above can thus be extended to accept a text value
using:
<xsd:element name="book">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute name="isbn" type="isbnType"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
Note the location of the attribute definition, showing that the extension is achieved through the addition of the attribute. This definition will accept the following XML element:
<book isbn="0836217462"> Funny book by Charles M. Schulz. Its title (Being a Dog Is a Full-Time Job) says it all ! </book>
W3C XML Schema supports mixed content though the mixed attribute
in the xsd:complexType element. Consider
<xsd:element name="book"> <xsd:complexType mixed="true"> <xsd:all> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> </xsd:all> <xsd:attribute name="isbn" type="xsd:string"/> </xsd:complexType> </xsd:element>
which will validate an XML element such as
<book isbn="0836217462"> Funny book by <author>Charles M. Schulz</author>. Its title (<title>Being a Dog Is a Full-Time Job</title>) says it all ! </book>
Unlike DTDs, W3C XML Schema mixed content doesn't modify the constraints on the sub-elements, which can be expressed in the same way as simple content models. While this is a significant improvement over XML 1.0 DTDs, note that the values of the character data, and its location relative to the child elements, cannot be constrained.
|
|
Table of Contents |
|
Content Types |
W3C XML Schema provides several flexible XPath-based features for
describing uniqueness constraints and corresponding references constraints.
The first of these, a simple uniqueness declaration, is declared with the
xsd:unique element. The following declaration, within the
context of our book document, indicates that the character name must
be unique.
<xsd:unique name="charNameMustBeUnique"> <xsd:selector xpath="character"/> <xsd:field xpath="name"/> </xsd:unique>
This location of the xsd:unique element in the schema gives
the context node in which the constraint holds. By inserting
xsd:unique under our book element, we specify that the
character has to be unique in the context of a book only.
The two XPaths defined in the uniqueness constraint are evaluated
relative to the context node. The first of these paths is defined by the
selector element. The purpose is to define the element which
has the uniqueness constraint -- the node to which the selector points must
be an element node.
The second path, specified in the xsd:field element. is
evaluated relative to the element identified by the
xsd:selector and can be an element or an attribute node. This
is the node whose value will be checked for uniqueness. Uniqueness over a
combination of several values can be specified by adding other
xsd:field elements within xsd:unique.
The second constraint construct, xsd:key, is similar to
xsd:unique, except that the value specified as unique can be
used as a key. This means that it has to be non-null, and that it can
be referenced. To use the
character name as a key, we can replace the xsd:unique by
xsd:key.
<xsd:key name="charNameIsKey"> <xsd:selector xpath="character"/> <xsd:field xpath="name"/> </xsd:key>
The third construct, xsd:keyref, allows us to define a
reference to a key. To show its usage, we introduce the
friend-of element, to be used against characters.
<character>
<name>Snoopy</name>
<friend-of>Peppermint Patty</friend-of>
<since>1950-10-04</since>
<qualification>
extroverted beagle
</qualification>
</character>
To indicate that friend-of needs to refer to a character
from the same book, we write, at the same level as we defined our key
constraint, the following:
<xsd:keyref name="friendOfIsCharRef" refer="charNameIsKey"> <xsd:selector xpath="character"/> <xsd:field xpath="friend-of"/> </xsd:keyref>
These capabilities are nearly independent of the other features in a schema. They are disconnected from the definition of the datatypes. The only point anchoring them to the schema is the place where they are defined, which establishes the scope of the uniqueness constraints.
|
|
Table of Contents |
|
Content Types |
Perhaps the first step in writing reusable schemas is to document them. W3C XML Schema provides an alternative to XML comments and processing instructions that might be easier to handle for supporting tools.
Human readable documentation can be defined by
xsd:documentation elements, while information targeted at
applications should be included in xsd:appinfo elements. Both
elements must be included in an xsd:annotation element. They
accept optional xml:lang and source attributes.
The source attribute is a URI reference that can be used to
indicate the purpose of the appinfo to the processing application.
The xsd:annotation elements can be added at the beginning of
most schema constructs as shown in example below. The appinfo section
demonstrates how custom namespaces and schemes might allow the binding of an
element to a Java class from within the schema.
<xsd:element name="book">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Top level element.
</xsd:documentation>
<xsd:documentation xml:lang="fr">
Element racine.
</xsd:documentation>
<xsd:appinfo source="http://example.com/foo/">
<bind xmlns="http://example.com/bar/">
<class name="Book"/>
</bind>
</xsd:appinfo>
</xsd:annotation>
...
For those who want to define a schema using several XML documents -- either to split up a large schema or to use libraries of schema snippets -- W3C XML Schema provides two mechanisms for including external schemas.
The first, xsd:include, is similar to a copy and paste of
the definitions of the included schema: it's an inclusion, and as such it
doesn't allow any overriding of definitions of the included schema. It can
be used in this way:
<xsd:include schemaLocation="character.xsd"/>
The second inclusion mechanism, xsd:redefine, is similar to
xsd:include, except that it lets you redefine the declarations
from the included schema.
<xsd:redefine schemaLocation="character12.xsd"> <xsd:simpleType name="nameType"> <xsd:restriction base="xsd:string"> <xsd:maxLength value="40"/> </xsd:restriction> </xsd:simpleType> </xsd:redefine>
Note that the declarations that are redefined must be placed in
the xsd:redefine element.
We've already seen many features that can be used together with
xsd:include and xsd:redefine to create libraries
of schemas. We've seen how we can reference previously defined elements;
how we can define datatypes by derivation and use them; and how we can
define and use groups of attributes. We've also seen the parallel between
elements and objects and datatypes and classes. There are other features
borrowed from object oriented design that can be used to create reusable
schemas.
The first feature derived from object oriented design is the substitution
group. Unlike the features we've seen so far, a substitution group isn't
defined explicitly through a W3C XML Schema element but through referencing
a common element (called the head), using a
substitutionGroup attribute. The head element doesn't hold any
specific declaration but must be global. All the elements within a
substitution group need to have a type that is either the same type as the
head element, or can be derived from it. Then they can all be used in place
of the head element. In the following example the element "surname" can be
used anywhere an element "name" has been defined.
<xsd:element name="name" type="xsd:string"/> <xsd:element name="surname" type="xsd:string" substitutionGroup="name" />
Now we can also define a generic "name-elt" element, head of a
substitution group, that couldn't be used directly but should be used in one
of its derived forms. This is done through declaring the element as
abstract, analagously to abstract classes in object oriented languages. The
following example defines name-elt as an abstract element that
should be replaced by either name or surname everywhere it is
referenced.
<xsd:element name="name-elt" type="xsd:string" abstract="true"/> <xsd:element name="name" type="xsd:string" substitutionGroup="name-elt" /> <xsd:element name="surname" type="xsd:string" substitutionGroup="name-elt" />
We could, on the other hand, wish to control derivation performed on a
datatype. W3C XML Schema supports this though the final
attribute in an xsd:complexType or xsd:element
element. This attribute can take the values restriction,
extension and #all to block derivation by restriction,
extension or any derivation. The following snippet would, for instance,
forbid any derivation of the characterType complex type.
<xsd:complexType name="characterType" final="#all">
The final attribute can operate only on elements and complex
types. W3C XML Schema provides a fine-grained mechanism that operates on
each facet to control the derivation of simple types. This attribute is
called fixed, and when its value is set to true, the
facet cannot be further modified (but other facets can still be added or
modified). The following prevents the size of our nameType simple
type from being redefined.
<xsd:simpleType name="nameType"> <xsd:restriction base="xsd:string"> <xsd:maxLength value="32" fixed="true"/> </xsd:restriction> </xsd:simpleType>
|
|
Table of Contents |
|
Content Types |
Namespace support in W3C XML Schema is flexible yet straightforward. It not only allows the use of any prefix in instance documents (unlike DTDs), but also lets you open your schemas to accept unknown elements and attributes from known or unknown namespaces.
Each W3C XML Schema document is bound to a specific namespace through the
targetNamespace attribute or to the absence of namespace
through the lack of such an attribute. We need at least one schema document
per namespace we want to define (elements and attributes without namespaces
can be defined in any schema, though).
Until now we have omitted the targetNamespace attribute,
which means that we were working without namespaces. To get into namespaces,
let's imagine that our example belongs to a single namespace.
<book isbn="0836217462" xmlns="http://example.org/ns/books/">
The least intrusive way to adapt our schema is to add more attributes to
our xsd:schema element.
<xsd:schema
xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
xmlns="http://example.org/ns/books/"
targetNamespace="http://example.org/ns/books/"
elementFormDefault="qualified"
attributeFormDefault="unqualified" >
The namespace declarations play an important role. The first
(xmlns:xsd="http://www.w3.org/2000/10/XMLSchema") says not only
that we've chosen to use the prefix xsd to identify the
elements that will be W3C XML Schema instructions, but also that we will
prefix the W3C XML Schema predefined datatypes with xsd, as we
have done in all our examples thus far. Understand that we could have chosen
any prefix instead of xsd. We could even make
http://www.w3.org/2000/10/XMLSchema our default namespace. In this
case, we would not have prefixed the W3C XML Schema elements.
Since we are working with the http://example.org/ns/books/ namespace, we define it as our default namespace. This means that we won't prefix the references to objects (datatypes, elements, attributes, etc.) belonging to this namespace. Again we could have chosen any prefix to identify this namespace.
The targetNamespace attribute lets you define, independently
of the namespace declarations, which namespace is described in this schema.
If you need to reference objects belonging to this namespace, which is
usually the case except when using a pure Russian Doll design, you need to
provide a namespace declaration in addition to the
targetNamespace.
The final two attributes in the example, (elementFormDefault
and attributeFormDefault), are a facility provided by W3C XML
Schema to control, within a single schema, whether attributes and elements
are considered by default to be qualified (in a namespace). This
differentiation between qualified and unqualified can be indicated by
specifying the default values, as above, but also when defining the element
or attribute, by adding a form attribute of value
qualified or unqualified.
It is important to note that only local elements and attributes can be specified as unqualified. All globally defined elements and attributes must always be qualified.
W3C XML Schema, not unlike XSLT and XPath, uses namespace prefixes within the value of some attributes to identify the namespace of data types, elements, attributes, etc. For instance, we've used this feature all along in our examples to identify the W3C XML Schema predefined datatypes. This mechanism can be extended to import definitions from any other namespace and so reuse them in our schemas.
Reusing definitions from other namespaces is done through a three-step
process. This process needs to be done even for the XML 1.0 namespace in
order to declare attributes such as xml:lang. First, the
namespace must be defined as usual.
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://example.org/ns/books/" xmlns:xml="http://www.w3.org/XML/1998/namespace" elementFormDefault="qualified" >
Then W3C XML Schema needs to be informed of the location at which it can
find the schema corresponding to the namespace. This is done using an
xsd:import element.
<xsd:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="myxml.xsd"/>
W3C XML Schema now knows that it should attempt to find any reference belonging to the XML namespace in a schema located at myxml.xsd. We can now use the external definition.
<xsd:element name="title">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute ref="xml:lang"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
You may wonder why we've chosen to reference the xml:lang
attribute from the XML namespace rather than creating an attribute with a
type xml:lang. We've done so because there is an important
difference between referencing an attribute (or an element) and referencing
a datatype when namespaces are concerned.
To finish this section about namespaces, we need to see how, as promised
in the introduction, we can open our schema to unknown elements, attributes
and namespaces. This is done using xsd:any and
xsd:anyAttribute, allowing, respectively, the inclusion of any
element or attribute.
For instance, if we want to extend the definition of our description type to any XHTML tag, we could declare
<xsd:complexType name="descType" mixed="true">
<xsd:sequence>
<xsd:any namespace="http://www.w3.org/1999/xhtml"
minOccurs="0" maxOccurs="unbounded"
processContents="skip"/>
</xsd:sequence>
</xsd:complexType>
The xsd:anyAttribute gives the same functionality for
attribute definitions.
The type descType is now mixed content and accepts an unbounded
number of any elements from the http://www.w3.org/1999/xhtml
namespace. The processContents attribute is set to
skip, telling a W3C XML Schema processor that no validation of these
elements should be attempted. The other permissible values for this
attribute are strict, asking to validate these elements, or
lax, asking the processor to validate them when possible. The
namespace attribute accepts a whitespace-separated list of
URIs, as well as the special values ##any (any namespace),
##local (non-qualified elements), ##targetNamespace (the
target namespace) or ##other (any namespace other than the
target).
|
|
Table of Contents |
|
Content Types |
We've now covered most of the features of W3C XML Schema, but we still
need to have a glance at some extensions that you can use within your
instance documents. In order to differentiate these other features, a
separate namespace, http://www.w3.org/2000/10/XMLSchema-instance, is
used, usually associated with the prefix xsi.
The xsi:schemaLocation and
xsi:noNamespaceSchemaLocation attributes allow you to tie a
document to its W3C XML Schema. This link is not mandatory, and other
indications can be given using application-dependent mechanisms (such as a
parameter on a command line), but it does help W3C XML Schema aware tools to
locate a schema.
Dependent on using namespaces, the link will be either
<book isbn="0836217462" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="file:library.xsd">
Or, as below (noting the syntax, with a URI for the namespace and the URI of the schema separated by a whitespace in the same attribute)
<book isbn="0836217462" xmlns="http://example.org/ns/books/" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://example.org/ns/books/ file:library.xsd">
The other use of xsi attributes is to provide information
about how an element corresponds to a schema. These attributes are
xsi:type, which lets you define the simple or complex type of
an element, and xsi:null, which lets you specify a null value
for an element (that has to be defined as nullable="true" in
the schema). You don't need to declare these attributes in your schema to be
able to use them in an instance document.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.