XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

An Introduction to Schematron
by Eddie Robertsson | Pages: 1, 2

Schematron processing

There are currently a few different Schematron processors available. In general these processors are divided into two groups: XSLT-based and XPath-based processors.

Since the Schematron specification is built on top of XSLT and XPath, all you really need to perform Schematron validation is an XSLT processor. Validation is then performed in two stages: the Schematron schema is, first, turned into a validating XSLT stylesheet that is, second, applied to the XML instance document to get the validation results. Since XSLT processors are available in most programming languages and on most platforms and operating systems, this validation technique will be explained in more detail in the next section.

For more API-like validators there currently exists two Schematron processors that are built on top of XPath. The first one is a Java implementation by Ivelin Ivanov that is part of the Cocoon project. This implementation can be accessed through the SourceForge.NET website. The second XPath implementation is Daniel Cazzulino's Schematron.NET using the Microsoft .NET platform which is also available at SourceForge.

XPath implementations of Schematron are generally faster than the XSLT approach because they do not need the extra step of creating a validating XSLT stylesheet and have less functionality. This also means that the functions in Schematron that are XSLT-specific (for example, document() and key() functions) are unavailable in the XPath implementation. This means, for example, that constraints between XML instance documents cannot be checked using an XPath implementation of Schematron. Since Schematron is still a fairly young schema language, many implementations differ in functionality and typically most XPath implementations only implement a subset of the Schematron functionality.

Schematron processing using XSLT

Schematron processing using XSLT is trivial to implement and works in two steps:

  1. The Schematron schema is first turned into a validating XSLT stylesheet by transforming it with an XSLT stylesheet provided by Academica Sinica Computing Centre. These stylesheets (schematron-basic.xsl, schematron-message.xsl and schematron-report.xsl) can be found at the Schematron site and the different stylesheets generate different output. For example, the schematron-basic.xsl is used to generate simple text output as in the example already shown.
  2. This validating stylesheet is then used on the XML instance document and the result will be a report that is based on the rules and assertions in the original Schematron schema.

This means that it is very easy to setup a Schematron processor because the only thing needed is an XSLT processor together with one of the Schematron stylesheets. Here is an example of how to validate the example used above where the XML instance document is called Person.xml and the Schematron schema is called Person.sch. The example use Saxon as an XSLT processor:

>saxon -o validate_person.xsl Person.sch schematron-basic.xsl

>saxon Person.xml validate_person.xsl

From pattern "Check structure":

From pattern "Check co-occurrence constraints":
    Assertion fails: "If the Title is "Mr" then the gender of the person must be "Male"." at
        /Person[1]
            <Person Title="Mr">...</>

ISO Schematron

Version 1.5 of Schematron was released in early 2001 and the next version is currently being developed as an ISO standard. The new version, ISO Schematron, will also be used as one of the validation engines in the DSDL (Document Schema Definition Languages) initiative.

ISO Schematron evaluates the functionality implemented in existing implementations of Schematron 1.5. Functionality that is not implemented at all or only in a few implementations will be evaluated for removal while some requested features will be added. Some of these new features are briefly explained below, but it should be noted that these changes are not final. More information can be found in the Schematron upgrade document.

Include mechanism

An include mechanism will be added to ISO Schematron that will allow a Schematron schema to include Schematron constructs from different documents.

Variables using <let>

In Schematron it is common for a rule to contain many assertions that test the same information. If the information is selected by long and complicated XPath expressions, this has to be repeated in every assertion that uses the information. This is both hard to read and error prone.

In ISO Schematron a new element let is added to the content model of the rule element that allows information to be bound to a variable. The let element has a name attribute to identify the variable and a value attribute used to select the information that should be bound to the variable. The variable is then available in the scope of the rule where it is declared and can be accessed in assertion tests using the $ prefix.

For example, say that a simple time element should be validated so that the value always match the HH:MM:SS format where 0<=HH<=23, 0<=MM<=59 and 0<=SS<=59:

<time>21:45:12</time>

Using the new let element this can be implemented like this in ISO Schematron:

<sch:rule context="time">
    <sch:let name="hour" value="number(substring(.,1,2))"/>
    <sch:let name="minute" value="number(substring(.,4,2))"/>
    <sch:let name="second" value="number(substring(.,7,2))"/>

    <!-- CHECK FOR VALID HH:MM:SS -->
    <sch:assert test="string-length(.)=8 and substring(.,3,1)=':' and substring(.,6,1)=':'">The time element should contain a time in the format HH:MM:SS.</sch:assert>
    <sch:assert test="$hour>=0 and $hour&lt;=23">The hour must be a value between 0 and 23.</sch:assert>
    <sch:assert test="$minute>=0 and $minute&lt;=59">The minutes must be a value between 0 and 59.</sch:assert>
    <sch:assert test="$second>=0 and $second&lt;=59">The second must be a value between 0 and 59.</sch:assert>
</sch:rule>

<value-of> in assertions

A change requested by many users is to allow value-of elements in the assertions so that value information can be shown in the result. The value-of element has a select attribute specifying an XPath expression that selects the correct information.

In the above schema the assertion that for example checks the hour could then be written so that the output result contain the erroneous value:

<sch:assert test="$hour>=0 and $hour&lt;=23">Invalid hour: <sch:value-of select="$hour"/>. The value should be between 0 and 23.</sch:assert>

The following instance

<time>25:45:12</time>

would then generate this output:

Assertion fails: "Invalid hour: 25. The value should be between 0 and 23."

Abstract patterns

Abstract patterns are a very powerful new feature that allows the user to identify a specific pattern in the data and make assertions about this pattern. If we keep to the example above the abstract pattern that should be validated is the definition of a time with three parts: hour, minute and second. In ISO Schematron an abstract pattern like the following can be written to validate this time abstraction:

<sch:pattern name="Time" abstract="true">
    <sch:rule context="$time">
        <sch:assert test="$hour>=0 and $hour&lt;=23">The hour must be a value between 0 and 23.</sch:assert>
        <sch:assert test="$minute>=0 and $minute&lt;=59">The minutes must be a value between 0 and 59.</sch:assert>
        <sch:assert test="$second>=0 and $second&lt;=59">The seconds must be a value between 0 and 23.</sch:assert>
    </sch:rule>
</sch:pattern>

Instead of validating the concrete elements used to define the time this abstract pattern instead work on the abstraction of what makes up a time: hours, minutes and seconds.

If the XML document use the below syntax to describe a time

<time>21:45:12</time>

the concrete pattern that realises the abstract one above would look like this:

<sch:pattern name="SingleLineTime" is-a="Time">
    <sch:param formal="time" actual="time"/>
    <sch:param formal="hour" actual="number(substring(.,1,2))"/>
    <sch:param formal="minute" actual="number(substring(.,4,2))"/>
    <sch:param formal="second" actual="number(substring(.,7,2))"/>
</sch:pattern>

If the XML instead uses a different syntax to describe a time the abstract pattern can still be used for the validation and the only thing that need to change is the concrete implementation. For example, if the XML looks like this

<time>
    <hour>21</hour>
    <minute>45</minute>
    <second>12</second>
</time>

the concrete pattern would instead be implemented as follows:

<sch:pattern name="MultiLineTime" is-a="Time">
    <sch:param formal="time" actual="time"/>
    <sch:param formal="hour" actual="hour"/>
    <sch:param formal="minute" actual="minute"/>
    <sch:param formal="second" actual="second"/>
</sch:pattern>

This means that the abstract pattern that performs the actual validation will stay the same independent of the actual representation of the data in the XML document.

The include mechanism makes it possible to define a separate Schematron schema that defines the validation rules as abstract patterns. Multiple "concrete schemas" can then be defined for each instance document that uses a different syntax for the abstractions. Each of these "concrete schemas" simply includes the schema with the abstract patterns and defines the mapping from the abstraction to the concrete elements.



1 to 1 of 1
  1. XPath Management Tools for Schematron
    2007-05-19 15:31:44 pgf
1 to 1 of 1