XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Working with a Metaschema

Working with a Metaschema

October 02, 2002

If in its entire lifespan W3C XML Schema (WXS) were used merely to validate document after earnest little document, it would have proven its worth. Happily, schemas are useful for much more than validation. XML applications can extrapolate all sorts of functionality from an XML vocabulary: everything from document authoring and GUI-building to marshaling, workflow, and process management.

WXS excels at just this sort of modeling. It provides a structural template that describes in detail each type and relationship: just the information an application would need, say, to build a new instance document from a data stream or to create an intuitive GUI for data entry. Given the tremendous complexity of WXS, however, applications which consume schemas face a daunting processing challenge. Often the full power of the language is neither needed nor wanted, as modeling requirements may be relatively simple, and developers don't want to be responsible for every possible wrinkle in a schema. If only we could constrain a candidate schema to use just a subset of the full WXS vocabulary...

Oh, wait, we can. "WXS vocabulary" is the tip-off: a schema is just an XML document, after all, and it can be validated like any other. What we need, in other words, is a schema for our schema.

In this article we'll investigate the uses of metaschemas and the techniques for creating them. This will bring us in close contact with the existing WXS metamodel, an interesting study in and of itself. We'll consider several strategies for bending this metamodel to our application's purposes, and we'll see which strategies best suit which requirements. (To tip the hand a bit, the prize will go to the WXS redefine component as a way of redefining parts of the WXS metamodel itself.)

Talking the Talk

As soon as the prefix "meta" finds its way into the room, conversation tends to get a bit stilted. After all, models are also known as "metadata," and so terms such as "metamodel" and "meta-metadata" can be slinking around in the same discussion. The OMG's Meta-Object Facility (MOF) helpfully defines formal levels for various kinds of data. Each level shown here describes and governs the previous one:

  • The information level includes raw data: non-schema XML documents, database rows, or object instances in memory are all examples.
  • The model level describes the information: examples are XML schema, RDB schema, and UML models.
  • The metamodel level governs the "shape" of models themselves. Any modeling language has a grammar and structure that prescribes what can be expressed in a model, and how to express it. A metaschema is an expression of a metamodel.
  • There is a meta-metamodel layer, which is in fact the focus of the MOF specification, and which addresses issues of portability between metamodels such as UML, IDL, and WXS. We'll not try to breathe such thin air today.

Where most schema discussions focus on the model level, we are more concerned with working at the metamodel level. The descriptors consumed by our application will be called "candidate schema," and live at the model level; they can validate information and can be validated as instance documents based on our metaschema.

Discovering the WXS Metamodel

Before we create our own metamodels, note that WXS already has a standard metamodel. If this is surprising, ask yourself how your favorite parser makes sure that a given schema is itself valid. Many parsers have this metamodel encoded in their own logic, but the metamodel is also expressed in normative "schemas for schemas,", which are ordinary WXS documents. In fact, you can validate a WXS document explicitly if you have these metaschemas handy. ("Our schema documents" as shown below are modifications of the standard ones, with schema-location links between them so they can be referenced locally by a parser.)

Metamodel Source Our Schema Document
WXS Part 1 — Structures WXS 1.0 Recommendation, Part 1, Appendix A XMLSchema.xsd
WXS Part 2 — Data Types WXS 1.0 Recommendation, Part 2, Appendix A XMLSchema2.xsd
XML+Namespaces http://www.w3.org/2001/xml.xsd XML+Namespaces.xsd

The WXS metamodel is dense; we won't attempt to map the whole of it here. To illuminate some useful areas, UML diagrams will show which definitions -- top-level elements, complex types and groups -- depend on other definitions. The notation is crude, focusing entirely on dependencies, and leaving out all cardinality and much other information. Suffice it to say that these had to simmer on the stove a long time before they were edible.

Don't be too spooked by the first overview diagram that shows the first few levels of the model starting from schema. Note that symbols recognizable from general-purpose schema design are few and far between. There are many intermediates, such as schemaTop and redefinable, that are of no use to a model designer, but that must be understood in order to leverage the WXS metamodel.


Before we look at specific techniques, let's define what we're after more concretely. For any application which acts dynamically based on a schema -- the most intuitive example is probably a GUI builder -- users of the application are expected to provide a schema as a way of defining their requirements: what data-entry forms to build, for example. The application defines its own schema flavor, if you will, which might enforce rules such as:

  • Refusal to accept derived complex types
  • Insistence on sequence-based models for child content (no all models)
  • Flattening the composition hierarchy to allow only two levels: parent and child element
  • Insisting on names for certain schema components
  • Insisting on default values for all optional attributes

Note that some of these rules imply restrictions on the standard metamodel, while some imply extensions. The application places these constraints on candidate schema to express business rules, to facilitate users' understanding by limiting redundant modeling options -- or perhaps just to limit the scope of development for a 1.0 version.

The goal, then, is to establish a metaschema for the application such that any candidate schema (1) is valid under WXS proper and (2) observes the application's own rules.

Pages: 1, 2

Next Pagearrow