W3C XML Schema Design Patterns: Avoiding Complexity
by Dare Obasanjo
|
Pages: 1, 2, 3, 4
Why You Should Understand How XML Namespaces Affect WXS
Support for XML Namespaces is woven tightly into the WXS recommendation. Namespaces are used in a number of places:
- when referencing global elements, attributes, or types;
- in XPath expressions used for identity constraints;
- in determining what elements and attributes schema declarations can validate; and
- when importing and including other schema documents.
Thus, schema authors should be familiar with how namespaces work, including their affect on W3C XML Schema. I wrote two MSDN articles which address this issue: "XML Namespaces and How They Affect XPath and XSLT" provides a detailed overview of XML namespaces and "Working with Namespaces in XML Schema" explains the ramifications of namespaces in WXS.
Why You Should Always Set elementFormDefault to "qualified"
Elements or attributes with a
namespace name are said to be "namespace qualified". It's possible to
override whether local declarations validate namespace qualified elements
and attributes or not. The
xs:schema element has the elementFormDefault and attributeFormDefault attributes, which specify whether local
declarations in the schema should validate namespace qualified elements
and attributes respectively. The valid values for either attribute are
"qualified" and "unqualified". The default value of both attributes is
"unqualified".
The form attribute on local element and attribute declarations can be
used to override the values of the elementFormDefault and
attributeFormDefault attributes specified on the
xs:schema element. This allows for fine-grained control over the way
validation of elements and attributes in the instance document operates
in relation to global or local declarations.
The following example, taken from the Kohsuke's article (the "Why You Should Avoid Local Declarations" section) shows exactly how these attributes can significantly affect the outcome of validation:
This schema
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://example.com">
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="familyName" type="xs:string" />
<xs:element name="firstName" type="xs:string" />
<xs:sequence>
<xs:complexType>
<xs:element>
<xs:schema>
validates the following document
<foo:person xmlns:foo="http://example.com">
<familyName> KAWAGUCHI <familyName>
<firstName> Kohsuke <firstName>
<foo:person>
which is unlikely what the schema author intended. And it's ugly, too. Altering the schema thus:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://example.com"
elementFormDefault="qualified">
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="familyName" type="xs:string" />
<xs:element name="firstName" type="xs:string" />
<xs:sequence>
<xs:complexType>
<xs:element>
<xs:schema>
allows it to validate
<person xmlns="http://example.com">
<familyName> KAWAGUCHI <familyName>
<firstName> Kohsuke <firstName>
<person>
or
<foo:person xmlns:foo="http://example.com">
<foo:familyName> KAWAGUCHI <foo:familyName>
<foo:firstName> Kohsuke <foo:firstName>
<foo:person>
Leaving the value of the attributeFormDefault attribute as
"unqualified" makes sense because most schema authors don't want to have
to namespace qualify all attributes explicitly by prefixing them.
Why You Should Use Attribute Groups
An attribute group definition is a way to create a named collection of attribute declarations and attribute wildcards. Attribute groups increase the modularity of schemas. You can declare a commonly used set of attributes in a single location and then reference them from other schemas.
When Kohsuke's article describes attribute groups as an alternative to global attribute declarations, it may give the incorrect impression that the two are mutually exclusive alternatives. A globally declared attribute is an individual, reusable attribute declaration. An attribute group is a modularly clustered set of attributes; the attribute declarations in an attribute group can either be local attribute declarations or references to global declarations. Kohsuke's article is not entirely accurate when it describes attribute groups as an alternative to global attribute declarations.
Why You Should Use Model Groups
A model group definition is a mechanism for creating named groups of elements using the all, choice, or sequence compositors. Model groups are useful for reusing groups of elements by avoiding type derivation. However, model groups are not a replacement for complex types; they cannot contain attribute declarations and they cannot be specified as the type of an element declaration. Additionally, derivation of model groups is much more limited than derivation of complex types.
Why You Should Use The Builtin Simple Types
A major benefit of WXS over DTDs in XML 1.0 is the existence of datatypes. The ability to specify that the values of elements or attributes are strings, dates, or numeric data enables schema authors to specify and validate the contents of XML data in an interoperable and platform independent manner. Given the number of built-in datatypes (44 by my count), it may be wise for schema authors to standardize on a subset of the built-in types to avoid information overload.
In most cases users can do without the subtypes of xs:string (e.g. xs:ENTITY or xs:language), the subtypes of xs:integer (e.g. xs:short or xs:unsignedByte), or the Gregorian date types (e.g. xs:gMonthDay or xs:gYearMonth). Eliminating these types reduces the amount of information to a more easily managed amount.
Why You Should Use Complex Types
A complex type definition is used to specify a content model consisting of elements and attributes. An element declaration can specify its content model by referring to a named or anonymous complex type. Named complex types can be referenced by name from the schema they are defined in or by external schema documents; anonymous complex types must be defined within the declaration for the element which uses the type. Additionally the content models of named complex types can be extended or restricted using WXS inheritance mechanisms.
Complex types are similar to model group definitions with two main differences. First, complex type definitions can include attributes in the content models they define. Second, it's possible to use type derivation with complex types, which isn't the case with named model groups. In Kohsuke's article he advocates using a combination of anonymous complex types, model group definitions, and attribute groups to specify the content model of an element instead of named complex types. He does so in an attempt to avoid dealing with what he sees as the complexity of named complex types. However, I'd counter that using three mechanisms instead of one to specify the content model of an element is actual more prone to confusion. Thus, in addition to the fact that named complex types allow for reuse of content models, they're also the most straightforward way of specifying the content model of an element.
Anonymous complex types should only be used if references to the type will not be needed outside the element declaration and there is no need for type derivation. It is important to note that it is not possible to derive a new type from an anonymous complex type. In general, schemas that make heavy use of anonymous types are likely to have problems with uniformity and consistency.
Why You Should Not Use Notation Declarations
Kohsuke's admonition to avoid notation declarations is spot on. They exist only to provide backward compatibility with DTDs, except they are not backward compatible with DTD notations. Pretend they do not exist. I certainly do.
Why You Should Use Substitution Groups Carefully
Substitution Groups provide a mechanism similar to subtype polymorphism in programming languages. One or more elements can be marked as being substitutable for a global element (also called the head element), which means that members of this substitution group are interchangeable with the head element in a content model. For example, for an Address substitution group with members USAddress and UKAddress, the generic element Address can be used in the content model, or it can be substituted by a USAddress or a UKAddress. The only requirement is that the members of the substitution group must be of the same type or be in the same type hierarchy as the head element.
The following is an example schema and the instance which it validates:
example.xsd:
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.example.com"
xmlns:ex="http://www.example.com"
elementFormDefault="qualified">
<xs:element name="book" type="xs:string" />
<xs:element name="magazine" type="xs:string" substitutionGroup="ex:book" />
<xs:element name="library">
<xs:complexType>
<xs:sequence>
<xs:element ref="ex:book" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
example.xml:
<library xmlns="http://www.example.com">
<magazine>MSDN Magazine</magazine>
<book>Professional XML Databases</book>
</library>
The content model of the library element says that it can
hold one or more book elements. Since magazine
elements are in the book substitution group, it's valid for
magazine elements to appear in the instance XML where
book elements are expected.
Substitution groups make content models more flexible and allow
extensibility in directions the schema author may not have
anticipated. This flexibility is a two-edged sword: although it allows
greater extensibility, it makes processing documents based on such schemas
more difficult. For instance, the code that processes the
library element must not only handle its child
book elements but magazine elements as well. If
the instance document specified additional schemas via the
xsi:schemaLocation attribute, the processing application could have to
deal with even more members of the book substitution group as
children of the library element.
Another complication is that members of a substitution group can be of
a type derived from the substitution group's head. Writing code to
properly handle any derived type generically is difficult, especially
since there are two opposite notions of derivation. The first,
restriction, restricts the range or values in the content model. The
second, extension, adds elements or attributes to the content
model. Certain attributes on element declarations can be used to give
schema authors more control over element substitutions in instance
documents and reduce the likelihood of unexpected substitutions in XML
instance documents. The block attribute is used to specify
whether elements whose types use a certain derivation method can
substitute for the element in an instance document, while the
final attribute is used to specify whether elements whose
types use a certain derivation method can declare themselves to be part of
the target element's substitution group. The default values of the
block and final attributes for all element
declarations in a schema can be specified via the
blockDefault and finalDefault attributes of the
root xs:schema element. By default all substitutions are
allowed without limitation.