Combining RELAX NG and Schematron
This article explains how to integrate two powerful XML schema languages, RELAX NG and Schematron. Embedding Schematron rules in RELAX NG is very simple because a RELAX NG validator ignores all elements not in the RELAX NG namespace (http://relaxng.org/ns/structure/1.0). This means that Schematron rules can be embedded in any element and on any level in a RELAX NG schema.
Here is a very simple RELAX NG schema that only defines one
element, Root:
<?xml version="1.0" encoding="UTF-8"?>
<element name="Root" xmlns="http://relaxng.org/ns/structure/1.0">
<text/>
</element>
Now if a Schematron rule should have the Root element
as its context, this rule could be added as an embedded Schematron
rule within the element element that defines the pattern
for Root:
<?xml version="1.0" encoding="UTF-8"?>
<element name="Root" xmlns="http://relaxng.org/ns/structure/1.0">
<sch:pattern name="Test constraints on the Root element"
xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:rule context="Root">
<sch:assert test="test-condition">Error message when
the assertion condition is broken...</sch:assert>
</sch:rule>
</sch:pattern>
<text/>
</element>
The Schematron rules embedded in a RELAX NG schema are inserted on the pattern level and must be declared in the Schematron namespace (http://www.ascc.net/xml/schematron).
Although RELAX NG has better support for co-occurrence constraints than WXS, there are still many types of co-occurrence constraints that cannot be sufficiently defined. An example of such a co-occurrence constraint is when the relationship between two (or more) element/attribute values is expressed as a mathematical expression.
As an example, we use a schema that defines a very simple international purchase order. This purchase order specifies the following:
The date of the order
An address to which the purchased products will be delivered
The items being purchased including an id, a name, a quantity, and a price with currency information)
Payment details including type of payment and total amount payable with currency information
Here is an example of an XML representation of such a purchase order:
<?xml version="1.0" encoding="UTF-8"?>
<purchaseOrder date="2002-10-22">
<deliveryDetails>
<name>John Doe</name>
<address>123 Morgue Street, Death Valley</address>
<phone>+61 2 9546 4146</phone>
</deliveryDetails>
<items>
<item id="123-XY">
<productName>Coffin</productName>
<quantity>1</quantity>
<price currency="AUD">2300</price>
<totalAmount currency="AUD">2300</totalAmount>
</item>
<item id="112-AA">
<productName>Shovel</productName>
<quantity>2</quantity>
<price currency="AUD">75</price>
<totalAmount currency="AUD">150</totalAmount>
</item>
</items>
<payment type="Prepaid">
<amount currency="AUD">2450</amount>
</payment>
</purchaseOrder>
A real life purchase order would be much more complex, but for the purpose of this article, this example is sufficient. A RELAX NG schema for the purchase order could look like this:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<ref name="purchaseOrder"/>
</start>
<define name="purchaseOrder">
<element name="purchaseOrder">
<attribute name="date">
<data type="date"/>
</attribute>
<ref name="deliveryDetails"/>
<element name="items">
<oneOrMore>
<ref name="item"/>
</oneOrMore>
</element>
<ref name="payment"/>
</element>
</define>
<define name="deliveryDetails">
<element name="deliveryDetails">
<element name="name"><text/></element>
<element name="address"><text/></element>
<element name="phone"><text/></element>
</element>
</define>
<define name="item">
<element name="item">
<attribute name="id">
<data type="string">
<param name="pattern">\d{3}-[A-Z]{2}</param>
</data>
</attribute>
<element name="productName"><text/></element>
<element name="quantity">
<data type="int"/>
</element>
<element name="price">
<ref name="currency"/>
</element>
<element name="totalAmount">
<ref name="currency"/>
</element>
</element>
</define>
<define name="payment">
<element name="payment">
<attribute name="type">
<choice>
<value>Prepaid</value>
<value>OnArrival</value>
</choice>
</attribute>
<element name="amount">
<ref name="currency"/>
</element>
</element>
</define>
<define name="currency">
<attribute name="currency">
<choice>
<value>AUD</value>
<value>USD</value>
<value>SEK</value>
</choice>
</attribute>
<data type="int"/>
</define>
</grammar>
This RELAX NG schema makes sure that all the required elements and attributes are present, and that some of these have the correct datatype. For example, all price information must have an integer value; the id of an item must be three digits, followed by a hyphen, followed by two uppercase letters; and the currency value must be one of AUD, USD or SEK. However, in a real world scenario it is more likely that you need to check more than the structure and the datatypes to make sure the purchase order is valid.
For the purchase order, the following constraints cannot be checked by RELAX NG, but they would all be very useful for complete validation of the data:
Each item
specifies quantity, price and
the totalAmount for that item. To make sure that
the data is valid, the value of the totalAmount
element must be equal to quantity
* price.
Both the price element and
the totalAmount element specify a currency, and
for this data to be valid, the price
and totalAmount elements must have the same
currency value
The payments section of the purchase order specifies
an amount element which value must equal the sum
of all the item's totalAmount values
All item's currency value must equal the currency value
of the amount element in the payments
section
Schematron can easily check all of these constraints, and the
context definition in the language provides a logical grouping of the
constraints. The first two rules specify constraints that apply to
each item element in the purchase order and hence this
element is the context. Here is an example of how you can specify the
Schematron rules needed to express this constraint:
<sch:pattern name="Check that the pricing and currency of an item is correct."
xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:rule context="purchaseOrder/items/item">
<sch:assert test="number(price) * number(quantity) = number(totalAmount)">
The total amount for the item doesn't add up to (quantity * price).</sch:assert>
<sch:assert test="price/@currency = totalAmount/@currency">
The currency in price doesn't match the currency in totalAmount.
</sch:assert>
</sch:rule>
</sch:pattern>
The Schematron rule specifies its context as all item
elements with a parent items element and a
grandparent purchaseOrder. For each of
the item elements that match this criterion, the first
assertion checks that the value of the price child
element multiplied by the value of the quantity child
element match the value of the totalAmount child
element. The second assertion makes sure that the currency value of
the price child element matches the currency value of
the totalAmount child element.
The last rules both apply to the amount element in the
payment section. This is also the context for the Schematron rules
that will check these two constraints. Here is an example of how these
rules can be specified:
<sch:pattern name="Check that the total amount is correct and that the currencies match"
xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:rule context="purchaseOrder/payment/amount">
<sch:assert
test="number(.) = sum(/purchaseOrder/items/item/totalAmount)">
The total purchase amount doesn't match the cost of all items.
</sch:assert>
<sch:assert
test = "not(/purchaseOrder/items/item/totalAmount/@currency != @currency)">
The currency in at least one of the items doesn't match the
currency for the total amount.
</sch:assert>
</sch:rule>
</sch:pattern>
The first assertion checks that the sum of all
the item element's totalAmount is equal to
the value of the context node (which is the amount
element) by using XPath's sum() function. The second
assertion makes sure that all the different item's
currency values match the currency value for the amount
element. Note that the following (similar) assertion
does not perform the same check:
<sch:assert test = "/purchaseOrder/items/item/totalAmount/@currency = @currency"
>...</sch:assert>
This assertion checks that at least one of the item's currency
values matches the currency in the amount
element. However, in this case we want to make sure that all the
item's currency values match, and hence we negate both the assertion
expression (using XPath's not() function) and the
operator used inside the assertion ('=' becomes '!='). When writing
Schematron rules this technique is often used to express the desired
constraint.
|
Now that all the Schematron rules are defined, the only remaining task is to insert them into the main RELAX NG schema. As already mentioned, a RELAX NG schema allows any element not in the RELAX NG namespace to appear anywhere in the schema where markup is allowed. However, to keep the RELAX NG schema well organized and easy to read, it is recommended that you embed the Schematron rules in one of two places:
Insert all the embedded Schematron rules at the beginning of the RELAX NG schema as a child of the top-level element. Then you always know that if you have embedded rules, they will be specified together and in the same place.
Specify each Schematron rule on the element
pattern that specifies the context of the embedded
rule. In the previous example this means that one of
the Schematron rules would be embedded on the element
pattern for the item element and the
other on the element pattern for
the amount element in the payment
section.
I prefer to embed each Schematron rule in the element that defines the context, but it is really up to the developer which method to use. Another good rule to follow is to always declare the Schematron namespace on the top-level element in the RELAX NG schema. That way you know that if the top-level element contains a declaration for the Schematron namespace, then the schema contains embedded Schematron rules. The complete RELAX NG schema for the purchase order with embedded Schematron rules might look like this:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
xmlns:sch="http://www.ascc.net/xml/schematron">
<start>
<ref name="purchaseOrder"/>
</start>
<define name="purchaseOrder">
<element name="purchaseOrder">
<attribute name="date">
<data type="date"/>
</attribute>
<ref name="deliveryDetails"/>
<element name="items">
<oneOrMore>
<ref name="item"/>
</oneOrMore>
</element>
<ref name="payment"/>
</element>
</define>
<define name="deliveryDetails">
<element name="deliveryDetails">
<element name="name"><text/></element>
<element name="address"><text/></element>
<element name="phone"><text/></element>
</element>
</define>
<define name="item">
<element name="item">
<sch:pattern
name="Check that the pricing and currency of an item is correct.">
<sch:rule context="purchaseOrder/items/item">
<sch:assert
test="number(price) * number(quantity) = number(totalAmount)">
The total amount for the item doesn't add up to (quantity * price).
</sch:assert>
<sch:assert
test="price/@currency = totalAmount/@currency">
The currency in price doesn't match the currency in totalAmount.
</sch:assert>
</sch:rule>
</sch:pattern>
<attribute name="id">
<data type="string">
<param name="pattern">\d{3}-[A-Z]{2}</param>
</data>
</attribute>
<element name="productName"><text/></element>
<element name="quantity">
<data type="int"/>
</element>
<element name="price">
<ref name="currency"/>
</element>
<element name="totalAmount">
<ref name="currency"/>
</element>
</element>
</define>
<define name="payment">
<element name="payment">
<attribute name="type">
<choice>
<value>Prepaid</value>
<value>OnArrival</value>
</choice>
</attribute>
<element name="amount">
<sch:pattern
name="Check that the total amount is correct and that the currencies match">
<sch:rule context="purchaseOrder/payment/amount">
<sch:assert
test="number(.) = sum(/purchaseOrder/items/item/totalAmount)">
The total purchase amount doesn't match the cost of all items.
</sch:assert>
<sch:assert
test="not(/purchaseOrder/items/item/totalAmount/@currency != @currency)">
The currency in at least one of the items doesn't match the
currency for the total amount.
</sch:assert>
</sch:rule>
</sch:pattern>
<ref name="currency"/>
</element>
</element>
</define>
<define name="currency">
<attribute name="currency">
<choice>
<value>AUD</value>
<value>USD</value>
<value>SEK</value>
</choice>
</attribute>
<data type="int"/>
</define>
</grammar>
Like most other XML schema languages, RELAX NG lacks the ability to specify constraints between XML instance documents. In many XML applications, this is a very useful functionality. A typical example would be to check if a certain ID reference has a corresponding ID in a different document. For the purchase order example in the preceding section, this could be a simple database file where all the available products are listed. Typically a simple database would contain the following information:
Date when the database was updated
One or more products
Each product have an id, a name, a description, a price and the number of items in stock
A sample XML instance document for the database would look like this:
<?xml version="1.0" encoding="UTF-8"?>
<products lastUpdated="2002-10-22">
<product id="123-XY">
<productName>Coffin</productName>
<description>Standard coffin, Size 200x80x50cm</description>
<numberInStock>4</numberInStock>
<price currency="AUD">2300</price>
</product>
<product id="112-AA">
<productName>Shovel</productName>
<description>Plastic grip shovel</description>
<numberInStock>2</numberInStock>
<price currency="AUD">75</price>
</product>
</products>
With the corresponding RELAX NG schema:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<ref name="products"/>
</start>
<define name="products">
<element name="products">
<attribute name="lastUpdated">
<data type="date"/>
</attribute>
<oneOrMore>
<ref name="product"/>
</oneOrMore>
</element>
</define>
<define name="product">
<element name="product">
<attribute name="id">
<data type="string">
<param name="pattern">\d{3}-[A-Z]{2}</param>
</data>
</attribute>
<element name="productName"><text/></element>
<element name="description"><text/></element>
<element name="numberInStock">
<data type="int"/>
</element>
<element name="price">
<ref name="currency"/>
</element>
</element>
</define>
<define name="currency">
<attribute name="currency">
<choice>
<value>AUD</value>
<value>USD</value>
<value>SEK</value>
</choice>
</attribute>
<data type="int"/>
</define>
</grammar>
Looking back at the purchase order in the preceding section each item purchased was specified as:
<item id="123-XY">
<productName>Coffin</productName>
<quantity>1</quantity>
<price currency="AUD">2300</price>
<totalAmount currency="AUD">2300</totalAmount>
</item>
Since there also exists a database for each product available for purchase, there are now at least two more constraints that can be checked for each purchase order:
Make sure that each item's id exists as a product id in the database
Make sure that the quantity ordered is less than or equal to the total number of products in stock for each item in the purchase order
Since these constraints require checks between XML documents, they
can only be checked by Schematron processors that support
XSLT's document() function (or similar functionality). If
a Schematron processor based on XSLT is used, this is not a problem;
but most XPath implementations of Schematron do not have this type of
functionality. If you use an XSLT implementation, the Schematron rule
for the first constraint can be specified like this:
<sch:pattern name="Check that the item exists in the database."
xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:rule context="purchaseOrder/items/item">
<sch:assert test = "document('Products.xml')/products/product/@id = @id"
>The item doesn't exist in the database.</sch:assert>
</sch:rule>
</sch:pattern>
Here the document() function is used to access the XML
instance document that contains the available products. Once
the document() function has retrieved the external
document, you can use normal XPath expressions to select the nodes of
interest. In this example, the id of all
the product elements with a parent products
is compared to the id of the item that is
currently being checked. If an item
element's id value does not exist in the database
(Products.xml), the assertion will fail.
The easiest way to check the second constraint is to use a different rule where the context is restricted using predicates. Here is an example of how this can be specified:
<sch:pattern name="Check that there are enough items in stock for the purchase."
xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:rule
context="purchaseOrder/items/item[@id = document('Products.xml')/products/product/@id]">
<sch:assert
test="number(document('Products.xml')/products/product[@id = current()/@id]/numberInStock)
>= number(quantity)">
There are not enough items of this type in stock for this quantity.
</sch:assert>
</sch:rule>
</sch:pattern>
This rule is a bit more complicated than the previous ones. The
first thing that is different is that the context specification for
this rule is using a predicate to limit the number of elements
checked. In this case, the predicate is used because instead of
selecting all the item elements in the document, only
the item elements with an id that exists in
the database should be selected. This ensures that when the processor
checks the assertion, it is certain that the item being validated
exists in the database.
|
The assertion test itself does in this case specify a predicate in
conjunction with the document() function. Here the
predicate is used to select the product element that has
an id that matches the id of
the item element that is currently being checked. The
assertion then checks that the numberInStock child
element (of product) has a value that is greater than or
equal to the value of the quantity child element
(of item).
Now we know how the rule selects the context node, and how the
assertion performs the validation, but what is the reason for the
added restriction on which item elements are selected?
Why can't the context simply be all the item elements in
the document and then the assertion for both the above constraints can
be included in the same rule?
The answer has its roots in the fact that a Schematron assertion will fire if its test condition evaluates to false. Part of the assertion expression look like this:
document('Products.xml')/products/product[@id = current()/@id]
This part of the assertion is specified to select
the product element from the database that has the
same id as the item currently being checked. If no such
product exists, the document() function will not return
any element at all, and this will cause the whole assertion expression
to fail. This is not the desired result since this assertion should
check that there are enough products in stock to make the
purchase. However, by specifying a rule that only selects the item
elements that do exist in the database, this situation will never
occur.
Another important issue when defining the context of a rule is that
an element can only be used once as the context for each pattern. This
means that if more than one rule is specified in the same pattern with
the same context element, only the first matching rule is used. If a
pattern defines multiple rules with the same context element, the most
restrictive rule must be specified first, followed by the other rules
in descending order, based on the restrictive features of each
rule. For programmers, this is analogous to how a long if-else chain
is specified: you start with the most restrictive condition and finish
with the most general condition. If done in reverse order, the first
statement will always be true and the others will never execute. To
illustrate, we will take a look at how to specify the above two rules
in one pattern, since both rules use the same context
(the item element).
<sch:pattern name="Combined pattern."
xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:rule context="purchaseOrder/items/item">
...
</sch:rule>
<sch:rule
context="purchaseOrder/items/item[@id = document('Products.xml')/products/product/@id]">
...
</sch:rule>
</sch:pattern>
If the rules were specified in the above order (which is the order
in which they were defined and specified in the example), validation
would not be performed correctly. The reason is because both rules
specify the same context element and in this case the most general
rule (context="purchaseOrder/items/item") is specified
first. Since this rule will match all the item elements, there will
not be any item elements left to match the second rule. To make this
work as expected, the rules must be specified in the reverse order
(the most restrictive rule first):
<sch:pattern name="Combined pattern."
xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:rule
context="purchaseOrder/items/item[@id = document('Products.xml')/products/product/@id]">
...
</sch:rule>
<sch:rule context="purchaseOrder/items/item">
...
</sch:rule>
</sch:pattern>
Now validation will be performed as expected. Since the most
restrictive rule (selects only the item elements that do
exist in the database) is specified first, the second rule will still
be applied to all item elements that do not exist in the
database. This means that the assertion in the second rule can be
simplified to always fail (test="false()") because if the
assertion is ever checked, it is certain that it is an invalid item
that does not exist in the database.
Here is the complete specification of the pattern for the two constraints after the appropriate changes have been made:
<sch:pattern name="Check each item against the database."
xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:rule
context="purchaseOrder/items/item[@id = document('Products.xml')/products/product/@id]">
<sch:assert
test="number(document('Products.xml')/products/product[@id = current()/@id]/numberInStock)
>= number(quantity)">
There are not enough items of this type in stock for this quantity.
</sch:assert>
</sch:rule>
<sch:rule context="purchaseOrder/items/item">
<sch:assert test="false()"
>The item doesn't exist in the database.</sch:assert>
</sch:rule>
</sch:pattern>
The complete RELAX NG schema with embedded Schematron rules for both co-occurrence constraints and the database checks will look like this:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
xmlns:sch="http://www.ascc.net/xml/schematron">
<start>
<ref name="purchaseOrder"/>
</start>
<define name="purchaseOrder">
<element name="purchaseOrder">
<attribute name="date">
<data type="date"/>
</attribute>
<ref name="deliveryDetails"/>
<element name="items">
<oneOrMore>
<ref name="item"/>
</oneOrMore>
</element>
<ref name="payment"/>
</element>
</define>
<define name="deliveryDetails">
<element name="deliveryDetails">
<element name="name"><text/></element>
<element name="address"><text/></element>
<element name="phone"><text/></element>
</element>
</define>
<define name="item">
<element name="item">
<sch:pattern name="Validate each item.">
<sch:rule
context="purchaseOrder/items/item[@id = document(
'Products.xml')/products/product/@id]">
<sch:assert
test="number(document('Products.xml')
/products/product[@id = current()/@id]/numberInStock) >= number(quantity)">
There are not enough items of this type in stock for this quantity.
</sch:assert>
<sch:assert
test="number(price) * number(quantity) = number(totalAmount)">
The total amount for the item doesn't add up to (quantity * price).
</sch:assert>
<sch:assert test="price/@currency = totalAmount/@currency"
>The currency in price doesn't match the currency in totalAmount.
</sch:assert>
</sch:rule>
<sch:rule context="purchaseOrder/items/item">
<sch:assert test="false()"
>The item doesn't exist in the database.</sch:assert>
</sch:rule>
</sch:pattern>
<attribute name="id">
<data type="string">
<param name="pattern">\d{3}-[A-Z]{2}</param>
</data>
</attribute>
<element name="productName"><text/></element>
<element name="quantity">
<data type="int"/>
</element>
<element name="price">
<ref name="currency"/>
</element>
<element name="totalAmount">
<ref name="currency"/>
</element>
</element>
</define>
<define name="payment">
<element name="payment">
<attribute name="type">
<choice>
<value>Prepaid</value>
<value>OnArrival</value>
</choice>
</attribute>
<element name="amount">
<sch:pattern
name="Check that the total amount is correct and that the currencies match">
<sch:rule
context="purchaseOrder/payment/amount">
<sch:assert
test="number(.) = sum(/purchaseOrder/items/item/totalAmount)">
The total purchase amount doesn't match the cost of all items.
</sch:assert>
<sch:assert
test="not(/purchaseOrder/items/item/totalAmount/@currency != @currency)">
</sch:rule>
</sch:pattern>
<ref name="currency"/>
</element>
</element>
</define>
<define name="currency">
<attribute name="currency">
<choice>
<value>AUD</value>
<value>USD</value>
<value>SEK</value>
</choice>
</attribute>
<data type="int"/>
</define>
</grammar>
|
One of WXS's major advantages over previous schema languages is the ability to specify an extensive selection of datatypes for attributes but also for elements with text content. In RELAX NG it is possible to use all the datatypes from WXS by specifying these as the datatype library used. Unfortunately this ability to control the text content of an element disappears if the element is defined to have mixed content (child elements mixed with text content). With the help of embedded Schematron rules it is possible to apply basic text validation even for mixed content elements.
An example of this could be when you have source XML data that should be transformed into high quality PDF documents. A very simple paragraph in the final document can in XML be represented like this:
<p>This is <b>ok</b> but this is<b> not</b> ok</p>
In this case it is very important where the space characters around
the b elements are situated. If the space character is
situated inside the b element then the bold font will
make the space character bigger than what it is supposed to be. For
this reason it is important that the text content inside
the b element does not start or end with a space
character. For the same reason the text preceding the b
element should always end with a space character and the text
following the b element should always start with a space
character. In the above example the space around the
first b element are correctly located while they are
wrong around the second b element.
The RELAX NG schema for the above example is very simple:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<element name="p">
<mixed>
<zeroOrMore>
<element name="b">
<text/>
</element>
</zeroOrMore>
</mixed>
</element>
</start>
</grammar>
The Schematron rules that are needed to check the extra constraints on the text content can be implemented like this:
<sch:pattern name="Check spaces around b tags">
<sch:rule
context="p/node()[following-sibling::b][preceding-sibling::b][1]">
<sch:assert test="substring(., string-length(.)) = ' '">
A space must be present before the b tag.
</sch:assert>
<sch:assert test="starts-with(., ' ')">
A space must be present after the b tag.
</sch:assert>
</sch:rule>
<sch:rule context="p/node()[following-sibling::b][1]">
<sch:assert test="substring(., string-length(.)) = ' '">
A space must be present before the b tag.
</sch:assert>
</sch:rule>
<sch:rule context="p/node()[preceding-sibling::b][1]">
<sch:assert test="starts-with(., ' ')">
A space must be present after the b tag.
</sch:assert>
</sch:rule>
<sch:rule context="p/b">
<sch:assert test="not(starts-with(., ' '))">
The text in the b tag cannot start with a space.
</sch:assert>
<sch:assert test="substring(., string-length(.)) != ' '">
The text in the b tag cannot end with a space.
</sch:assert>
</sch:rule>
</sch:pattern>
The Schematron rules to check this constraint is divided into four parts (each part is one rule with a separate context), which are explained in the order they are declared:
For all child nodes of the p
element where the nearest preceding sibling and
nearest following sibling is a b element,
check that a space character is present immediately
after the preceding b element and that a
space character is present immediately before the
following b element.
For all child nodes of the p
element where the nearest following sibling is
a b element, check that a space character
is present immediately before the b
element.
For all child nodes of the p
element where the nearest preceding sibling is
a b element, check that a space character
is present immediately after the b
element.
For all child b elements, check
that the text content does not begin or end with a
space character.
The complete RELAX NG schema with embedded Schematron rules look like this:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
xmlns:sch="http://www.ascc.net/xml/schematron">
<start>
<element name="p">
<sch:pattern name="Check spaces around b tags">
<sch:rule
context="p/node()[following-sibling::b][preceding-sibling::b][1]">
<sch:assert
test="substring(., string-length(.)) = ' '">
A space must be present before the b tag.
</sch:assert>
<sch:assert
test="starts-with(., ' ')">
A space must be present after the b tag.
</sch:assert>
</sch:rule>
<sch:rule context="p/node()[following-sibling::b][1]">
<sch:assert
test="substring(., string-length(.)) = ' '">
A space must be present before the b tag.
</sch:assert>
</sch:rule>
<sch:rule context="p/node()[preceding-sibling::b][1]">
<sch:assert test="starts-with(., ' ')">
A space must be present after the b tag.
</sch:assert>
</sch:rule>
<sch:rule context="p/b">
<sch:assert test="not(starts-with(., ' '))">
The text in the b tag cannot start with a space.
</sch:assert>
<sch:assert
test="substring(., string-length(.)) != ' '">
The text in the b tag cannot end with a space.
</sch:assert>
</sch:rule>
</sch:pattern>
<mixed>
<zeroOrMore>
<element name="b">
<text/>
</element>
</zeroOrMore>
</mixed>
</element>
</start>
</grammar>
This is of course a very simple example in which you only check for
space characters. In a more advanced example you also need to check
for other whitespace characters (like tabs), and the fact that the
last b element should not be followed by a space if the
immediately following character is a punctuation character. However,
the example still gives you an idea of the things you can do with
Schematron and mixed content.
Since Schematron is namespace-aware as is RELAX NG, it is no
problem to embed Schematron rules in a RELAX NG schema that define one
or more namespaces for the document. In the preceding section, it was
shown how Schematron schemas should be set up to use namespaces by
using the ns element. For embedded Schematron rules, this
works exactly the same. Instead of only embedding the Schematron rule
that defines the extra constraint, you also need to embed
the ns elements that define the namespaces used. The same
example that was used in Namespaces and Schematron is
used, but now RELAX NG is used to define the structure, while
Schematron checks the co-occurrence constraint. The instance example
used was:
<ex:Person Title="Mr" xmlns:ex="http://www.topologi.com/example">
<ex:Name>Eddie</ex:Name>
<ex:Gender>Male</ex:Gender>
</ex:Person>
A RELAX NG schema for the above would look like this:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
ns="http://www.topologi.com/example">
<start>
<element name="Person">
<element name="Name"><text/></element>
<element name="Gender">
<choice>
<value>Male</value>
<value>Female</value>
</choice>
</element>
<attribute name="Title"/>
</element>
</start>
</grammar>
The Schematron rule that needs to be embedded to check the
co-occurrence constraint (if title is "Mr" then the value of
element Gender must be "Male") will look like this (note
the use of the ex prefix):
<sch:pattern name="Check co-occurrence constraint">
<sch:rule context="ex:Person[@Title='Mr']">
<sch:assert test="ex:Gender = 'Male'">
If the Title is "Mr" then the gender of the person must be "Male".
</sch:assert>
</sch:rule>
</sch:pattern>
If this rule were embedded on its own the Schematron validation
would fail because the prefix ex is not mapped to a
namespace URI. In order for this to work, the ns element
that defines this mapping must also be embedded:
<sch:ns prefix="ex"
uri="http://www.topologi.com/example"
xmlns:sch="http://www.ascc.net/xml/schematron"/>
I always insert these Schematron namespace mappings at the start of the host schema. This means that they are always declared in the same place and it is easy to see which mappings are included without having to search through the entire schema. The complete RELAX NG schema with the embedded rules would then look like this:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
ns="http://www.topologi.com/example"
xmlns:sch="http://www.ascc.net/xml/schematron">
<!-- Include all the Schematron namespace mappings at the top -->
<sch:ns prefix="ex" uri="http://www.topologi.com/example"/>
<start>
<element name="Person">
<sch:pattern name="Check co-occurrence constraint">
<sch:rule context="ex:Person[@Title='Mr']">
<sch:assert test="ex:Gender = 'Male'">
If the Title is "Mr" then the gender of the person must be "Male".
</sch:assert>
</sch:rule>
</sch:pattern>
<element name="Name"><text/></element>
<element name="Gender">
<choice>
<value>Male</value>
<value>Female</value>
</choice>
</element>
<attribute name="Title"/>
</element>
</start>
</grammar>
Since embedded Schematron rules are not part of the RELAX NG specification, most RELAX NG processors will not recognize and perform the validation constraints expressed by the rules. In fact, the embedded Schematron rules will be completely ignored by the processor since they are declared in a different namespace then RELAX NG's. This means that in order to use the Schematron rules for validation this functionality must be added. Currently there exists two options for how this can be achieved:
The embedded rules are extracted from the RELAX NG schema and concatenated into a Schematron schema. This schema can then be used for normal Schematron validation of the XML instance document. Since both RELAX NG and Schematron use XML syntax, it is fairly easy to perform this extraction using XSLT. This technique will be described in detail in the following section.
The RELAX NG processor can be modified to allow embedded Schematron-like rules and perform the validation as part of the normal RELAX NG validation. This technique is used in Sun's MSV which has an add-on that will validate XML instance documents against RELAX NG schemas annotated with rules and assertions. However, the way the rules are embedded in the RELAX NG schema is slightly different if this option is used compared to the method described in this chapter. Some of these differences include:
More information and details about this are provided in the documentation included in the download of the MSV add-on.
It should be noted that the rules and assertion specified using this method doesn't really have anything to do with Schematron more than that they use the same name for the elements.
|
To extract the embedded Schematron rules from the RELAX NG schema, the RNG2Schtrn.xsl stylesheet can be used. This stylesheet will also extract Schematron rules that have been declared in RELAX NG modules that are included in or referenced from the base schema.
The result from the script is a complete Schematron schema that can be used to validate the XML instance document using a Schematron processor as described in the section Introduction to Schematron. The XML instance document is then validated against the RELAX NG schema using a normal RELAX NG processor that will ignore all the embedded rules. This means that validation results are available from both Schematron validation and RELAX NG validation and if needed the results can be merged into one report. The whole process is described in the following figure:
As shown in the figure, there are two distinct paths in the validation process, which means that if timing requirements are important both paths can be implemented as a separate process and be executed in parallel.
A batch file that would (using the Win32 executable of Jing and Saxon) validate an XML instance document against both a RELAX NG schema and its embedded Schematron rules can look like this:
echo Running Jing validation on Sample.xml...
jing PurchaseOrder.rng Sample.xml
echo Creating Schematron schema from PurchaseOrder.rng...
saxon -o PurchaseOrder.sch PurchaseOrder.rng RNG2Schtron.xsl
echo Running Basic Schematron validation on file Sample.xml...
saxon -o validate.xsl PurchaseOrder.sch schematron-basic.xsl
saxon Sample.xml validate.xsl
So, first, the XML instance document is validated against the RELAX NG schema using Jing, and then it is validated with the embedded Schematron rules using Saxon. An output example could look like this:
Running Jing validation on Sample.xml...
Error at URL "file:/C:/Sample.xml", line number 7: unknown element "BogusElement"
Creating Schematron schema from PurchaseOrder.rng...
Running Basic Schematron validation on file Sample.xml...
From pattern "Check that each team is registered in the tournament":
Assertion fails: "The item doesn't exist in the database." at
/purchaseOrder[1]/items[1]/item[2]
<item id="112-AX">...</>
Done.
The Topologi Schematron Validator is a free graphical validator that can validate an XML instance document against a RELAX NG schema with embedded Schematron rules.
Schematron is a very good complement to RELAX NG, and there is little that cannot be validated by the combination of the two. This article has shown how to embed Schematron rules in a RELAX NG schema as well as providing guidelines for how to perform validation. A Java implementation of Schematron that works as a wrapper around Xalan can be downloaded from Topologi. This implementation also contains classes to perform RELAX NG validation (using Jing) with embedded Schematron rules.
It is up to each project and use-case to evaluate if embedding Schematron rules in RELAX NG schemas is a suitable technique to achieve more powerful validation. Following is a list of some advantages to take into account:
By combining the power of WXS and Schematron the limit for what can be performed in terms of validation is raised to a new level.
Many of the constraints that previously had to be checked in the application can now be moved out of the application and into the schema.
Since Schematron lets you provide your own error messages (the content of the assertion elements) you can assure that each message is as explanatory as needed.
And some disadvantages:
In time critical applications the time overhead of processing the embedded Schematron rules may be too long. This is especially true if XSLT implementations of Schematron are used in conjunction with the extraction method in the preceding section. Extensive use of XSLT's document() function is also very resource demanding and time consuming.
Since the extraction of Schematron rules from a RELAX NG schema is performed with XSLT, embedded Schematron rules are only supported in RELAX NG schemas that use the full XML syntax.
The ability to combine embedded Schematron rules with a different schema language is not unique to RELAX NG and should be possible in all XML schema languages that use XML syntax and have an extensibility mechanism. The only thing needed is to modify the XSLT extractor stylesheet to accommodate the extension mechanism in the host XML schema language used.
I would like to thank Rick Jelliffe and Mike Fitzgerald for comments and suggestions on this article.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.