The Beginning of the Endgame
A Look at the Changes in the Pre-CR W3C XML Schemas Draft
This article looks at those changes in the recent Pre-CR draft of W3C XML Schemas that will most effect developers and users. Requirements for data interchange with database systems have been important during W3C XML Schema's development. The recent changes also support markup languages and schema construction better.
The Candidate Recommendation (CR) drafts are slated to appear hot on the heels of the current drafts. The XML Schema Working Group was aware that authors, implementers, schema writers, and technical evaluators needed to know the most recent changes, especially since they include some syntax changes that will affect schemas using type derivation.
Now anyType Is Top
The old ur-type (i.e., the supertype) can now be used as a type in declarations. It has been given the friendlier and less literary name anyType to reflect that in XML the top-level type can be thought of as the union of every possible subtype. In the unified model underneath W3C XML Schemas, even brand new complex types are considered restrictions of the ur-type.
In the Datatypes draft, the schema declarations use the ur-type of all simple types anySimpleType. This ur-type has been provided to allow bootstrapping declarations of the primitive built-in types in the schema for schemas, and also to help reasoning about the simple type system. It is not available for use in schemas.
New Namespace Identifier
Type names belong to the built-in datatypes namespace; the drafts use the typical prefix xsd:. Schema declarations use the same namespace. The namespace URI for the current draft has changed from that used by previous drafts; this indicates that the usage of various elements has changed substantially. The schema namespace URI reference is now http://www.w3.org/2000/10/XMLSchema.
The attributes that W3C XML Schemas defines for use in document instances use the namespace URI reference http://www.w3.org/2000/10/XMLSchema-instance, and the drafts use the typical prefix xsi:.
New Syntax for Type Declarations
The most visible change in the new schema draft is syntactic changes to the elements complexType and simpleType.These do not alter functionality.
There are four changes to complexType:
- the old content attribute has been replace by various elements and attributes, in particular simpleContent and complexContent;
- the old derivedBy attribute, used to specify whether a type is derived by extension or restriction from a base type, has been replaced by subelements extension and restriction, available on complexContent and simpleContent; similarly the attribute base has been moved (this change also applies to simpleType);
- a Boolean attribute mixed is now available on complexType and complexContent to specify that general data content is allowed in addition to elements;
-
empty elements can be defined by complexTypes with no values.
Here's an example declaration in the old syntax, from the old Primer.
<xsd:element name='internationalPrice'>
<xsd:complexType base='xsd:decimal' derivedBy='extension'>
<xsd:attribute name='currency' type='xsd:string' />
</xsd:complexType>
</xsd:element>
And the equivalent declaration in the new syntax, taken from the Primer.
<xsd:element name="internationalPrice">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:decimal">
<xsd:attribute name="currency" type="xsd:string" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
An example of a declaration for an empty element:
<xsd:element name="a">
<xsd:complexType>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType" />
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
Alarming as this may be compared to the equivalent declaration using XML DTDs,
<!ELEMENT a EMPTY >
one can expect that empty elements will usually belong to types which also have attributes. XML Schemas allow many permutations of empty elements not available in DTDs.
Comment
This change is needed because W3C XML Schemas 1.0 doesn't allow the use of attributes to select the type information of elements. In order for the schema for schemas to represent the W3C XML Schema language well, the common XML idiom of using attributes to subtype the type identified by the element name cannot be supported. This involves more than idiom: by requiring the use of subelements rather than attributes, either the subelements must wrap the contents, or the subelements appear as the first siblings to select particular content sequences. Both of these solutions have problems of scale (combinatorial explosions when there are several attributes with different values) and effect (nested elements does not indicate which of their ancestors they relate to).
In my view, this makes W3C XML Schemas 1.0 not necessarily suitable for defining idiomatic, user-oriented markup languages. Start tags need to act sometimes as simple field names, such as data from a database, but other times more like a parameterized function, template-ized class definition, or shell commands with named arguments. W3C XML Schemas 1.0 undoubtedly fits database-style uses better than markup uses; or data where there needs to be ad hoc overriding of type facets.
However, it is important not to make too much of this point, as it just means that XML schemas are no more powerful than DTDs in this regard. As using attributes to select a subtype is not possible, and although we can still carry on using those attributes, in our schema we must actually use the looser type made from the union of the different subtypes. On the whole, W3C XML Schemas have a much wider range of options than DTDs, which may rapidly degenerate into use of an ANY declared content type in the same situation.
This is one area that I hope a W3C XML Schemas 1.1 or 2.0 may fix as soon as possible.
Pages: 1, 2 |