The W3C XML Schema Specification in Context
by Rick Jelliffe
|
Pages: 1, 2, 3, 4, 5
W3C XML Schema and ISO SGML Extended Facilities (Meta-DTDs and Lexical Types)
A W3C XML Schema is a high-level specification of an architecture. W3C XML Schemas could be implemented as
- a transformation on the document to add xsi:type elements, based on the type derivation mechanism;
- a transformation on the schema to derive an effective schema, expressed according to the ISO HyTime Architectural Forms Definition Requirements;
-
architectural parse of the document using the effective schema as a meta-DTD and the xsi:type attribute as the element form.
It has not been proven yet that all W3C XML Schema constraints can be expressed using meta-DTDs and the other standard features of the ISO SGML Extended Facilities (given in the Annexes to the ISO HyTime standard). Consequently, an architectural validation system using meta-DTDs in ISO SGML markup declaration syntax may not completely validate every W3C XML Schema instance. In particular, the use of namespaces complicates understanding of the transformations required. Certainly it is not true that every schema definable using Architectural Forms has an equivalent W3C XML Schema: attribute renaming cannot be performed, for example. The tag/type distinction is the same as the element-form/architecture distinction: an abstract element type is a "base" (architectural) element.
W3C XML Schemas provides similar lexical capabilities to the ISO SGML Extended Facilities Lexical Definition Requirements, using a non-standard regular expression syntax.
W3C XML Schema and Perl Regular Expressions
| Perl Regular Expressions | W3C XML Schema Regular Expressions | Comments |
|---|---|---|
|
^ = beginning of string |
^ = character ^ only |
All regular expression matches start from the beginning of the string. For substring matching use .*substring.* |
|
$ = end of string |
$ = character $ only |
All regular expression matches end at the end of the string |
|
Zero-width assertions, look-ahead and look-behind, back references |
Not available |
|
|
Non-greedy + and * |
Not available |
|
|
\c |
An XML NAME character |
|
|
\i |
An XML initial NAME (i.e, SGML NAMESTRT) character |
|
|
\033 and \xAB |
XML Numeric Character Reference must be used |
|
|
\p{} |
\p{} |
The character classes allowed are the Unicode Consortium's character classes. |