DSDL Interoperability Framework
1. What's DSDL?
DSDL ("Document Schema Definition Languages") is a project of the ISO/IEC JTC 1/SC 34 (chair, Jim Mason) Working Group 1 (chair, Charles Goldfarb). The word "document" is meant to be read as "XML document-oriented applications", as opposed to data-oriented applications. "Languages" is used in the plural form because DSDL is not intended to point to a One True Schema Language.
DSDL is chaired by Martin Bryan, and its editors include James Clark, Murata Makoto, Rick Jelliffe, Martin Bryan, Diederik Gerth van Wijk, Ken Holman and me (Eric van der Vlist).
DSDL is necessary because other XML schema languages (primarily W3C XML Schema) do not meet the needs of "document heads", and document validation is too complex to be done using a single language. Our goal is to propose a set of specifications which will include a framework, several schema languages (including Relax NG and Schematron), a datatype system, and other pieces needed for document validation.
2. Why an interoperability framework?
Why does DSDL need an Interoperability Framework? The quick answer is that the Interoperability Framework is the glue between all the pieces of DSDL. The chief design principle of DSDL is to split the issue of describing and validating documents into simpler issues: grammar based validation, rule based validation, content selection, datatypes, and so on. Different types of validations and transformations, defined inside or outside the DSDL project, often need to be associated with each other. The framework allows for the integration of these validations and transformations.
Examples of such mixing include localization of numeric or date formats, prevalidation canonicalization to simplify the expression of a schema, independent content separated into different documents to be validated independently, aggregation of complex content into a single text node, separation of structured simple content into a set of elements, and so on.
3. At the beginning: two complementary proposals
The DSDL interoperability framework is a work in progress. Its first wave gave birth to two different proposals, based on two different and complementary approaches: Rick Jelliffe's Schemachine and my Xvif.
3.1. Rick Jelliffe's Schemachine
We can think of Rick Jelliffe's Schemachine as "traditional" in the sense that his proposal is a continuation of XPipe or the W3C "XML-Pipeline" Note. It describes pipes of transformations and validations applied to full documents.
3.1.1. Schemachine basics
Rick Jelliffe gives the following description of his proposal. It is based on XML Pipeline structures, but with rearrangement and renaming. It is embedded in Schematron-like superstructure with titles and phases and able to be implemented minimally -- all validators and translators are command-line executable programs, and the framework document is translated into BAT files or Bourne shell scripts (i.e., validators etc. are treated as black boxes). Schemachine aims at validation rather than declarative description per se. (In particular, the further down a transformation chain that data gets, the more difficult it will be to tie the effect of a schema to the original document.) It supports both validation of explicit structure and validation of complex data values. It leaves issues of simple datatyping to particular validators, viewing validation as a tree of processes. Finally, it supports in (@exclude) and out of band signaling (@haltOnFail).
3.1.2. Schemachine example
A couple of short examples are better than a long explanation.
<schemachine xmlns="...."> <title>Example Schema</title> <pass> <validate engine="schemachine:xsd" /> <validate engine="schemachine:schematron"> <param name="schema" href="a Schematron schema"/> </validate> </pass> </schemachine>
This first example passes a document through a W3C XML Schema validation followed by a Schematron validation.
<schemachine xmlns="...."> <title>Another Example Schema</title> <ns prefix="html" url="..." /> <pass> <select engine="schemachine:namespace_selector"> <param name="pattern">html:body</param> <output name="htmlbody" /> </select> <validate engine="schemachine:relax_ng"> <param name="schema" href="...."/> <param name="feasible">true</param> <input name="htmlbody"/> </validate> </pass> </schemachine>
Here the document is passed through a "selector" which selects the
html:body element. The output of the selection is used as
the input of a Relax NG validation.
3.1.3. Schemachine features
Rick Jelliffe carefully crafted a proposal with all the features needed to validate complex documents. Some concepts (e.g., phases) are inherited from Schematron, and Schemachine has all the bells ands whistles needed to fly:
Phases let users define different validation phases.
Selectors are filters which retain only the part of a document on which a partial schema will be applied.
Validators are containers to invoke schema validation.
Tokenizers split a text node into a set of elements.
Titles let you define info for the validation report.
3.2. My own XVIF
While Jelliffe has come up with a solid proposal obviously easy to implement, I wanted to explore more adventurous fields and felt that a proof of concept was needed to check the dangers and potential of my ideas.
XVIF ("XML Validation Interoperability Framework") is both a framework proposal and a prototype written in Python. It is available under an MPL open source license.
3.2.1. XVIF basics
XVIF has both very similarities with and differences from the approach taken by the Schemachine. It's designed to be used within a "host language" -- which could be a schema language (Relax NG, W3C XML Schema, Schematron), a transformation language (XSLT, Regular Fragmentations, STX) or a "pipelining" language (XVIF could be embedded within the structure of the Schemachine, Ant, XPipe). The current version of the prototype implements only XVIF within Relax NG. XVIF defines "micro-pipes" of transformations and validations applied locally on the "current" node. It integrates tightly with hosting languages: for Relax NG, a XVIF pipes are patterns; for XSLT they would be extension elements. XVIF has fallback mechanisms to ensure that a schema or transformation can be read by non-XVIF aware processors. It is is currently minimalist: bells and whistles will be added if it flies. XVIF takes advantage of the structures of the host language for complex features. Finally, it's focused on defining the basic building blocks. Shortcuts will be added later on where needed and verbosity isn't an issue at this stage.
3.2.2. XVIF example
Let's look at our first example of XVIF:
<?xml version="1.0" encoding="utf-8"?> <element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo"> <if:pipe> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/,/"/> <if:validate type="http://relaxng.org/ns/structure/1.0"> <if:apply> <oneOrMore> <choice> <value>foo</value> <value>bar</value> </choice> </oneOrMore> </if:apply> </if:validate> </if:pipe> </element>
This example defines a Relax NG schema where the implicit "start" pattern is an element with name "foo," and whose content is validated by a pattern "if:pipe". This is a micro-pipe of transformations and validations applied to all the elements, text nodes and attributes found in the "foo" element.
The pipe itself is a transformation, splitting text nodes using the
/,/, and a Relax NG validation applied
to the result of this transformation.
A text node will thus be interpreted as a comma separated list of values, and the list validates against a Relax NG schema expecting one or more values equal to "foo" or "bar".