Working with a Metaschema
If in its entire lifespan W3C XML Schema (WXS) were used merely to validate document after earnest little document, it would have proven its worth. Happily, schemas are useful for much more than validation. XML applications can extrapolate all sorts of functionality from an XML vocabulary: everything from document authoring and GUI-building to marshaling, workflow, and process management.
WXS excels at just this sort of modeling. It provides a structural template that describes in detail each type and relationship: just the information an application would need, say, to build a new instance document from a data stream or to create an intuitive GUI for data entry. Given the tremendous complexity of WXS, however, applications which consume schemas face a daunting processing challenge. Often the full power of the language is neither needed nor wanted, as modeling requirements may be relatively simple, and developers don't want to be responsible for every possible wrinkle in a schema. If only we could constrain a candidate schema to use just a subset of the full WXS vocabulary...
|
Related Reading
XML Schema |
Oh, wait, we can. "WXS vocabulary" is the tip-off: a schema is just an XML document, after all, and it can be validated like any other. What we need, in other words, is a schema for our schema.
In this article we'll investigate the uses of metaschemas
and the techniques for creating them. This will bring us in close
contact with the existing WXS metamodel, an interesting study in
and of itself. We'll consider several strategies for bending this
metamodel to our application's purposes, and we'll see which
strategies best suit which requirements. (To tip the hand a bit,
the prize will go to the WXS redefine component as a
way of redefining parts of the WXS metamodel itself.)
As soon as the prefix "meta" finds its way into the room, conversation tends to get a bit stilted. After all, models are also known as "metadata," and so terms such as "metamodel" and "meta-metadata" can be slinking around in the same discussion. The OMG's Meta-Object Facility (MOF) helpfully defines formal levels for various kinds of data. Each level shown here describes and governs the previous one:
Where most schema discussions focus on the model level, we are more concerned with working at the metamodel level. The descriptors consumed by our application will be called "candidate schema," and live at the model level; they can validate information and can be validated as instance documents based on our metaschema.
Before we create our own metamodels, note that WXS already has a standard metamodel. If this is surprising, ask yourself how your favorite parser makes sure that a given schema is itself valid. Many parsers have this metamodel encoded in their own logic, but the metamodel is also expressed in normative "schemas for schemas,", which are ordinary WXS documents. In fact, you can validate a WXS document explicitly if you have these metaschemas handy. ("Our schema documents" as shown below are modifications of the standard ones, with schema-location links between them so they can be referenced locally by a parser.)
| Metamodel | Source | Our Schema Document |
|---|---|---|
| WXS Part 1 — Structures | WXS 1.0 Recommendation, Part 1, Appendix A | XMLSchema.xsd |
| WXS Part 2 — Data Types | WXS 1.0 Recommendation, Part 2, Appendix A | XMLSchema2.xsd |
| XML+Namespaces | http://www.w3.org/2001/xml.xsd | XML+Namespaces.xsd |
The WXS metamodel is dense; we won't attempt to map the whole of it here. To illuminate some useful areas, UML diagrams will show which definitions -- top-level elements, complex types and groups -- depend on other definitions. The notation is crude, focusing entirely on dependencies, and leaving out all cardinality and much other information. Suffice it to say that these had to simmer on the stove a long time before they were edible.
Don't be too spooked by the first overview diagram that shows the first few levels of
the model starting from schema. Note that symbols
recognizable from general-purpose schema design are few and far
between. There are many intermediates, such as
schemaTop and redefinable, that are of no
use to a model designer, but that must be understood in order
to leverage the WXS metamodel.
Before we look at specific techniques, let's define what we're after more concretely. For any application which acts dynamically based on a schema -- the most intuitive example is probably a GUI builder -- users of the application are expected to provide a schema as a way of defining their requirements: what data-entry forms to build, for example. The application defines its own schema flavor, if you will, which might enforce rules such as:
all models)Note that some of these rules imply restrictions on the standard metamodel, while some imply extensions. The application places these constraints on candidate schema to express business rules, to facilitate users' understanding by limiting redundant modeling options -- or perhaps just to limit the scope of development for a 1.0 version.
The goal, then, is to establish a metaschema for the application such that any candidate schema (1) is valid under WXS proper and (2) observes the application's own rules.
|
Once the standard metaschemas are in hand, we can see several techniques for leveraging the WXS metamodel:
We could extend or restrict the metaschema,
creating our own types where necessary, for instance a
mySchema or myLocalSimpleType.
This is a non-starter, though, because it flunks the first
of our criteria: models which observe this derived
metaschema would not function as ordinary WXS schemas
because our derived types would be unknown to generic WXS
tools.
We could define metamodel information in our own
namespace, and allow candidate schemas to use both the
WXS namespace and our own. This is simple enough and is
explicitly allowed under WXS by the openAttrs
base type, which allows schema components to include
attributes from other namespaces. This technique works
well enough for extending the metamodel, but it doesn't
address requirements for restriction.
We could rewrite the WXS metaschema to suit our purposes. That is, we could simply edit XMLSchema.xsd to change component definitions. This feels a bit icky, and it certainly poses the risk of breaking compatibility with WXS proper. It is a valid approach, however, so long as one exercises great care in making changes.
WXS provides a means of incrementally changing existing
schemas, in order to create new versions. This is the
redefine component, and it turns out
to be a potent means of leveraging the WXS metamodel
itself. Only certain components can be redefined --
remember the redefinable type from the overview diagram. Still, this is a
preferable approach to creating modified metaschema
documents for maintenance as well as aesthetic
reasons.
Pattern-based validation is always an option,
and it's an especially strong one here. In implementing
various rules, we'll face the usual limitations of WXS in
expressing document-scope constraints; with
redefine as our best option for reuse, we're
additionally hobbled. Constraints expressed in XPath and
asserted via XSLT or Schematron can establish rules that
none of the above techniques can manage.
We'll now look at a series of simple examples that illustrate most of the techniques described above.
redefineLet's say we're building a component that creates a graphical form for entry of application data. We want to use WXS as our type model to define the shape of this data, but the business requirements have been bounded so that the following WXS constructs are unnecessary:
all) in content models (we only want sequences and choices)Each of these constraints can be expressed with a
separate redefinition of the WXS metamodel. Therefore we
build our own metaschema whose target is the standard WXS
namespace. It includes a single redefine
element:
<?xml version='1.0' encoding='UTF-8'?>
<xs:schema
targetNamespace="http://www.w3.org/2001/XMLSchema"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="XMLSchema.xsd">
</xs:redefine>
</xs:schema>
A redefine can hold any number of
redefinitions, so we'll add each of the three as children
of the empty element shown above. First, we'll nix
complex-type derivation. Here's another piece of the WXS
metamodel, showing how type extension is implemented:

Our target is the complexTypeModel: we
redefine this to exclude complexContent and
simpleContent as modeling options:
<xs:group name="complexTypeModel">
<xs:choice>
<xs:sequence>
<xs:group ref="xs:typeDefParticle" minOccurs="0"/>
<xs:group ref="xs:attrDecls"/>
</xs:sequence>
</xs:choice>
</xs:group>
Removal of all content models is similarly
straightforward. The target now is the
typeDefParticle, which as the previous
diagram shows is a focal point of the content modeling
system.

Our redefinition simply fails to list all as a member of the group:
<xs:group name="typeDefParticle">
<xs:choice>
<xs:element name="group" type="xs:groupRef"/>
<xs:element ref="xs:choice"/>
<xs:element ref="xs:sequence"/>
</xs:choice>
</xs:group>
Finally we attack the localElement
component, forbidding deep hierarchies of complex-type
elements by insisting that a local element have only
simple type. A redefine of a complex type
has the odd appearance of a type extending itself:
<xs:complexType name="localElement">
<xs:complexContent>
<xs:restriction base="xs:localElement">
<xs:sequence>
<xs:element ref="xs:annotation" minOccurs="0"/>
<xs:choice minOccurs="0">
<xs:element name="simpleType" type="xs:localSimpleType"/>
</xs:choice>
<xs:group ref="xs:identityConstraint" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
(Note that the distinction in the standard metamodel between
topLevelElement and localElement
is critical here: without it we'd potentially be
prohibiting complex-type elements, period, which would
make for some pretty trivial document models.)
The completed metaschema is GUIBuilder.xsd.
See also a valid candidate schema Valid.xsd, as
well as three that each flunk one of the constraints: DerivedType.xsd,
UsesAConjunction.xsd,
and NestedTypes.xsd.
Note that all of the candidate schemas are valid
under normal WXS — just change the
schemaLocation from
"GUIBuilder.xsd" to "XMLSchema.xsd" to
prove it.
Perhaps our GUI builder can handle choice and union
types, although this obviously adds complexity. Two or
more parallel interface panels must be presented to the
user, which is a solvable problem. What to say about
these panels, though? If the application is asked to
allow entry of either a city and state or a ZIP code,
for example, how can the application indicate to the
user which panel means what? The most intuitive
approach would be for the candidate schema to name each
of the possible choices in some descriptive way. WXS
doesn't allow all possible children of
choice to be named, though, and, even if it
did, component names are not generally meant for
end-user consumption.
Here an attribute from a separate namespace is the natural choice for extending the content model. A very simple schema is developed for the new "named choices" namespace, and the candidate schema simply references this along with the normal WXS metaschema.
Let's say a given processor can't work with missing
attribute values. The requirement is set that any
attribute in the model must either be required or must
provide a default value. This is not so easy to redefine
using the attribute component type. We've
stumbled across a general weakness of WXS: content model
constraints based on instance values cannot be
implemented.
In XPath, by contrast, it is dead simple to express this rule, and using XSLT it is just as easy to enforce it. The validating transform below can be applied to the candidate schema (see documents ExplicitAttributes.xsl and MissingDefault.xsd).
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text" />
<xsl:strip-space elements="*" />
<xsl:template match="text ()" />
<xsl:template match="//xs:attribute
[(@use='optional' or not (@use)) and not (@default)]">
<xsl:text>ERROR: Must provide a default value for optional attribute </xsl:text>
<xsl:value-of select="@name" />
<xsl:text>.</xsl:text>
</xsl:template>
</xsl:transform>
|
More from XML Schema Clinic |
This system isn't perfect. There are many ways in which I'd like to leverage the WXS metamodel that are either closed to me or just too complicated to be worth the trouble. This isn't a shortcoming in WXS, as I see it; if the type model were as pliable as I'd like it to be, it just wouldn't be W3C XML Schema and wouldn't have the tremendous descriptive power and precision that I also want.
Where they are feasible, redefinitions of schema components offer an elegant way to tailor the WXS model to the needs of an application. XPath/XSLT validation can provide another option, but it's important to see past logistics and remember that the WXS metamodel is as stiff as it is for a reason. If you find yourself demanding features in your application's candidate schema that make them malformed under WXS proper, or changing so many things that the metamodel is unrecognizable, you should probably be building your metamodel from scratch or working from a different starting point.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.