XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

W3C XML Schema Design Patterns: Avoiding Complexity
by Dare Obasanjo | Pages: 1, 2, 3, 4

Why You Should Favor key/keyref/unique Over ID/IDREF For Identity Constraints

DTDs provide a mechanism for specifying that an attribute's type is ID, i.e., its value will be unique within the document and matches the Name production in XML 1.0. IDs in XML 1.0 can also be referenced by attributes of type IDREF or IDREFS. For compatibility with DTDs, WXS has the xs:ID, xs:IDREF, and xs:IDREFS types.

WXS identity constraints are used for specifying unique values, keys, or references to keys using XPath expressions defined within the scope of an element declaration. Comparing feature for feature, the identity constraint mechanisms offer more than ID/IDREF. First, there is no limit on the values or types that can be used as part of an identity constraint. IDs can only be one of a specific range of values (e.g., 7 is not a valid ID). A more important benefit of the schema identity constraints is that a ID or IDREF has to be unique within the document, but WXS identity constraints don't. The symbol space for unique IDs is the entire document, but for unique keys it's the target scope of the XPath. This is particularly useful if uniqueness is needed in two overlapping value spaces with different scopes in the same XML document. For example, consider an XML document that contained room numbers and table numbers for a hotel. It is likely that some of the numbers overlap (i.e. there is a room 18 and a table 18), but they should be unique within either value space.

The WXS family of ID types are not exactly compatible with the DTD ID types. First, the xs:ID, xs:IDREF, and xs:IDREFS types can be applied to both elements and attributes in WXS, although they can only apply to attributes in their DTD equivalents. Second, there's no restriction on how many attributes of type xs:ID can appear on an element, although such a restriction exists for ID attributes in the DTD equivalents.

Why You Should Use Chameleon Schemas Carefully

The target namespace of a schema document identifies the namespace name of the elements and attributes which can be validated against the schema. A schema without a target namespace can typically only validate elements and attributes without a namespace name. However, if a schema without a target namespace is included in a schema with a target namespace, the target namespaceless schema assumes the target namespaces of the including schema. This feature is typically called the Chameleon schema design pattern.

In Kohsuke's article he claims that the chameleon schema pattern does not work, which is incorrect. A full rebuttal of Kohsuke's claim was made by Michael Leditschke on XML-DEV, and it shows that the design pattern does work and is useful for creating a reusable module of type definitions and declarations.

There is a problem with combining chameleon schemas with identity constraints. Although QName references to types, definitions, and declarations in the chameleon schema are coerced into the namespace of the including schema, the same is not done for XPath expressions used by xs:key, xs:keyref, and xs:unique identity constraints. Consider the following schema:


<xs:schema
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 elementFormDefault="qualified">

 <xs:element name="Root">

  <xs:complexType>
    <xs:sequence>
     <xs:element name="person" type="PersonType" maxOccurs="unbounded" />
    </xs:sequence>
  </xs:complexType>

  <xs:key name="PersonKey">
   <xs:selector xpath="person"/>
   <xs:field xpath="@name"/>
  </xs:key>

  <xs:keyref name="BestFriendKey" refer="PersonKey">
   <xs:selector xpath="person"/>
   <xs:field xpath="@best-friend"/>
  </xs:keyref>

 </xs:element>

 <xs:complexType name="PersonType">
  <xs:simpleContent>
   <xs:extension base="xs:string">
    <xs:attribute name="best-friend" type="xs:string" />
    <xs:attribute name="name" type="xs:string" />
   </xs:extension>
  </xs:simpleContent>
 </xs:complexType>

</xs:schema>

If this schema is included in another schema with a target namespace, the XPath expressions in both the key and keyref will fail. In this specific example, the person element is in no namespace in the chameleon schema, but once included in another schema it picks up that target namespace. The XPath expressions which match on a person without a target namespace will not work without signifying that they no longer work since processors are not obliged to ensure that path expressions in identity constraint actually return results.

The point is that it is not advisable to use identity constraints in chameleon schemas.

Why You Should Not Use Default Or Fixed Values Especially For Types Of xs:QName.

The primary complaint against default and fixed values is that they cause new data to be inserted into the source XML after validation, thus changing the data. This means that an unvalidated document that has a schema with default values is incomplete. Tying the actual content of the XML document to the validation process is unwise since a schema may not always be available. It's also unwise to assume that consumers of the document will always perform validation.

The xs:QName type has additional validation problems caused by the fact that it has no canonical form. Consider this schema and XML instance:


 <xs:schema
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 targetNamespace="http://www.example.com"
 xmlns:ex="http://www.example.com"
 xmlns:ex2="ftp://ftp.example.com"
 elementFormDefault="qualified">

 <xs:element name="Root">
  <xs:complexType>
    <xs:sequence>
     <xs:element name="Node" type="xs:QName" default="ex2:FtpSite" />
    </xs:sequence>
  </xs:complexType>
 </xs:element>

</xs:schema>

<Root xmlns="http://www.example.com" 
  xmlns:ex2="smtp://smtp.example.org" 
  xmlns:foo="ftp://ftp.example.com">
 <Node />
</Root>

What value should be inserted into the Node element upon validation? Should it be "ex2:FtpSite"? Even if the ex2 prefix is mapped to a different namespace in the instance document than in the schema? Maybe it should be "foo:FtpSite" because the prefix "foo" is mapped to the same namespace that "ex2" was mapped to in the schema. But then what would happen if no XML namespace declaration existed for the ftp://ftp.example.com namespace? Would a namespace declaration have to be inserted? None of these questions can be answered in a satisfactory manner without violating some opinions as to what the correct behavior should be. It is best to avoid using xs:QName default values because it's unlikely that different implementations agree on the relevant semantics.

Why You Should Use Restriction And Extension Of Simple Types

Restriction of a simple type involves constraining the facets of the type, thus reducing the permitted values of the type. Such restrictions involve specifying a maximum length for a string value, specifying a date range, or enumerating the list of permitted values. Types constrained in this manner are very commonly used by schema authors and account for most uses of type derivation in WXS. Such types can be used by both elements and attributes as their type definition.

Extension of simple types allows one to create a complex type (i.e. an element content model) with simple content that has attributes. A typical extension scenario is any situation where an element declaration has a simple type as its content and one or more attributes. Since such element content models occur commonly in XML documents, derivation by extension is another commonly used feature.

As with complex types, there are named and anonymous simple types. Named simple types can be referenced by name from the schema they are defined in or from external schema documents. Anonymous simple types must be defined within the declaration for the element or attribute which uses the type. And type derivation can only be performed on named types.

A common misconception is that anonymous types with the same structure are the same type. In other words, assuming that this schema fragment


<-- fragment A -->

<xs:element name="quantity">
 <xs:simpleType>
   <xs:restriction base="xs:positiveInteger">
    <xs:maxExclusive value="100"/>
   </xs:restriction>
  </xs:simpleType>
</xs:element>

<xs:element name="size">
 <xs:simpleType>
   <xs:restriction base="xs:positiveInteger">
    <xs:maxExclusive value="100"/>
   </xs:restriction>
  </xs:simpleType>
</xs:element>

is equivalent to

<-- fragment B -->

<xs:simpleType name="underHundred">
 <xs:restriction base="xs:positiveInteger">
  <xs:maxExclusive value="100"/>
 </xs:restriction>
</xs:simpleType>

<xs:element name="size" type="underHundred"/> 

<xs:element name="quantity" type="underHundred"/>

is incorrect with regard to whether both element declarations have the same type. Various aspects of WXS may require element declarations to have the same type (substitution groups, specifying key/keyref pairs, and type derivation). For instance, a keyref must be of the same type as a key. However, most features of WXS assume that the element declarations in fragment A have different types and those in fragment B to have the same type.

Why You Should Use Extension Of Complex Types

Extension of a complex type involves adding extra attributes or elements to the content model in the derived type. Elements added via extension are treated as if they were appended to the content model of the base type in sequence. This technique is useful for extracting the common aspects of a set of complex types and then reusing these commonalities via extending the base type definition. The following schema fragment showing how extension enables the reuse of common aspects of a mailing address is taken from the discussion on complex type extension and example in the WXS Primer.


<xs:complexType name="Address">
  <xs:sequence>
   <xs:element name="name"   type="xs:string"/>
   <xs:element name="street" type="xs:string"/>
   <xs:element name="city"   type="xs:string"/>
  </xs:sequence>
 </xs:complexType>

 <xs:complexType name="USAddress">
  <xs:complexContent>
   <xs:extension base="Address">
    <xs:sequence>
     <xs:element name="state" type="USState"/>
     <xs:element name="zip"   type="xs:positiveInteger"/>
    </xs:sequence>
   </xs:extension>
  </xs:complexContent>
 </xs:complexType>

 <xs:complexType name="UKAddress">
  <xs:complexContent>
   <xs:extension base="Address">
    <xs:sequence>
     <xs:element name="postcode" type="UKPostcode"/>
    </xs:sequence>
    <xs:attribute name="exportCode" type="xs:positiveInteger" fixed="1"/>
   </xs:extension>
  </xs:complexContent>
 </xs:complexType>

In this schema the Address type defines the information common to addresses in general; its derived types add information specific to addresses from the United States and United Kingdom, respectively. The ability to reuse and build upon content models using extension is a powerful and useful feature of WXS that promotes modularity and content uniformity.

There is a caveat for processors that deal with types derived by extension. This caveat has to do with type-aware processors and the elements added to a content model by extension. In the future it is possible that type-aware languages like XQuery or XSLT 2.0 will be able to process XML elements and attributes polymorphically. For instance, an application can decide to process all elements of type Address or that have Address as their base type, choosing to process the information that is common to all types. However a query such as

//*[. instance of Address]/city

could return unexpected results if dealing with a derived type that extended the content model in the following way


 <xs:complexType name="BadAddress">
  <xs:complexContent>
   <xs:extension base="Address">
    <xs:sequence>
     <-- address format has two city entries, one for neighborhood 
	         and another for the actual city -->
     <xs:element name="city" type="xs:string"/>
     <xs:element name="state" type="xs:string"/>
     <xs:element name="country" type="xs:string"/>
    </xs:sequence>
    <xs:attribute name="exportCode" type="positiveInteger" fixed="1"/>
   </xs:extension>
  </xs:complexContent>
 </xs:complexType>

Although the example is contrived and the scenario seems unlikely, it demonstrates a real risk. A more detailed exposition on this potential problem has been provided by Paul Prescod on XML-DEV.

Pages: 1, 2, 3, 4

Next Pagearrow