XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Combining RELAX NG and Schematron

February 11, 2004

Embedding Schematron Rules in RELAX NG

This article explains how to integrate two powerful XML schema languages, RELAX NG and Schematron. Embedding Schematron rules in RELAX NG is very simple because a RELAX NG validator ignores all elements not in the RELAX NG namespace (http://relaxng.org/ns/structure/1.0). This means that Schematron rules can be embedded in any element and on any level in a RELAX NG schema.

Here is a very simple RELAX NG schema that only defines one element, Root:


<?xml version="1.0" encoding="UTF-8"?>
<element name="Root" xmlns="http://relaxng.org/ns/structure/1.0">
   <text/>
</element>

Now if a Schematron rule should have the Root element as its context, this rule could be added as an embedded Schematron rule within the element element that defines the pattern for Root:


<?xml version="1.0" encoding="UTF-8"?>
<element name="Root" xmlns="http://relaxng.org/ns/structure/1.0">
   <sch:pattern name="Test constraints on the Root element" 
	 xmlns:sch="http://www.ascc.net/xml/schematron">
    <sch:rule context="Root">
      <sch:assert test="test-condition">Error message when 
		  the assertion condition is broken...</sch:assert>
    </sch:rule>
   </sch:pattern>
   <text/>
</element> 			

The Schematron rules embedded in a RELAX NG schema are inserted on the pattern level and must be declared in the Schematron namespace (http://www.ascc.net/xml/schematron).

Co-occurrence constraints

Although RELAX NG has better support for co-occurrence constraints than WXS, there are still many types of co-occurrence constraints that cannot be sufficiently defined. An example of such a co-occurrence constraint is when the relationship between two (or more) element/attribute values is expressed as a mathematical expression.

As an example, we use a schema that defines a very simple international purchase order. This purchase order specifies the following:

  • The date of the order

  • An address to which the purchased products will be delivered

  • The items being purchased including an id, a name, a quantity, and a price with currency information)

  • Payment details including type of payment and total amount payable with currency information

Here is an example of an XML representation of such a purchase order:

<?xml version="1.0" encoding="UTF-8"?>
<purchaseOrder date="2002-10-22">
  <deliveryDetails>
    <name>John Doe</name>
    <address>123 Morgue Street, Death Valley</address>
    <phone>+61 2 9546 4146</phone>
  </deliveryDetails>
  <items>
    <item id="123-XY">
      <productName>Coffin</productName>
      <quantity>1</quantity>
      <price currency="AUD">2300</price>
      <totalAmount currency="AUD">2300</totalAmount>
    </item>
    <item id="112-AA">
      <productName>Shovel</productName>
      <quantity>2</quantity>
      <price currency="AUD">75</price>
      <totalAmount currency="AUD">150</totalAmount>
    </item>
  </items>
  <payment type="Prepaid">
    <amount currency="AUD">2450</amount>
  </payment>
</purchaseOrder>	

A real life purchase order would be much more complex, but for the purpose of this article, this example is sufficient. A RELAX NG schema for the purchase order could look like this:

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <ref name="purchaseOrder"/>
  </start>
  <define name="purchaseOrder">
    <element name="purchaseOrder">
      <attribute name="date">
        <data type="date"/>
      </attribute>
      <ref name="deliveryDetails"/>
      <element name="items">
        <oneOrMore>
          <ref name="item"/>
        </oneOrMore>
      </element>
      <ref name="payment"/>
    </element>
  </define>
  <define name="deliveryDetails">
    <element name="deliveryDetails">
      <element name="name"><text/></element>
      <element name="address"><text/></element>
      <element name="phone"><text/></element>
    </element>
  </define>
  <define name="item">
    <element name="item">
      <attribute name="id">
        <data type="string">
          <param name="pattern">\d{3}-[A-Z]{2}</param>
        </data>
      </attribute>
      <element name="productName"><text/></element>
      <element name="quantity">
        <data type="int"/>
      </element>
      <element name="price">
        <ref name="currency"/>
      </element>
      <element name="totalAmount">
        <ref name="currency"/>
      </element>
    </element>
  </define>
  <define name="payment">
    <element name="payment">
      <attribute name="type">
        <choice>
          <value>Prepaid</value>
          <value>OnArrival</value>
        </choice>
      </attribute>
      <element name="amount">
        <ref name="currency"/>
      </element>
     </element>
  </define>
  <define name="currency">
    <attribute name="currency">
      <choice>
        <value>AUD</value>
        <value>USD</value>
        <value>SEK</value>
      </choice>
    </attribute>
    <data type="int"/>
  </define>
</grammar>	

This RELAX NG schema makes sure that all the required elements and attributes are present, and that some of these have the correct datatype. For example, all price information must have an integer value; the id of an item must be three digits, followed by a hyphen, followed by two uppercase letters; and the currency value must be one of AUD, USD or SEK. However, in a real world scenario it is more likely that you need to check more than the structure and the datatypes to make sure the purchase order is valid.

For the purchase order, the following constraints cannot be checked by RELAX NG, but they would all be very useful for complete validation of the data:

  1. Each item specifies quantity, price and the totalAmount for that item. To make sure that the data is valid, the value of the totalAmount element must be equal to quantity * price.

  2. Both the price element and the totalAmount element specify a currency, and for this data to be valid, the price and totalAmount elements must have the same currency value

  3. The payments section of the purchase order specifies an amount element which value must equal the sum of all the item's totalAmount values

  4. All item's currency value must equal the currency value of the amount element in the payments section

Schematron can easily check all of these constraints, and the context definition in the language provides a logical grouping of the constraints. The first two rules specify constraints that apply to each item element in the purchase order and hence this element is the context. Here is an example of how you can specify the Schematron rules needed to express this constraint:

<sch:pattern name="Check that the pricing and currency of an item is correct." 
xmlns:sch="http://www.ascc.net/xml/schematron">
  <sch:rule context="purchaseOrder/items/item">
    <sch:assert test="number(price) * number(quantity) = number(totalAmount)">
      The total amount for the item doesn't add up to (quantity * price).</sch:assert>
    <sch:assert test="price/@currency = totalAmount/@currency">
      The currency in price doesn't match the currency in totalAmount.
		</sch:assert>
  </sch:rule>
</sch:pattern>			

The Schematron rule specifies its context as all item elements with a parent items element and a grandparent purchaseOrder. For each of the item elements that match this criterion, the first assertion checks that the value of the price child element multiplied by the value of the quantity child element match the value of the totalAmount child element. The second assertion makes sure that the currency value of the price child element matches the currency value of the totalAmount child element.

The last rules both apply to the amount element in the payment section. This is also the context for the Schematron rules that will check these two constraints. Here is an example of how these rules can be specified:

<sch:pattern name="Check that the total amount is correct and that the currencies match" 
xmlns:sch="http://www.ascc.net/xml/schematron">
  <sch:rule context="purchaseOrder/payment/amount">
    <sch:assert 
    test="number(.) = sum(/purchaseOrder/items/item/totalAmount)">
      The total purchase amount doesn't match the cost of all items.
    </sch:assert>
    <sch:assert 
    test = "not(/purchaseOrder/items/item/totalAmount/@currency != @currency)">
      The currency in at least one of the items doesn't match the 
      currency for the total amount.
    </sch:assert>
  </sch:rule>
</sch:pattern>	

The first assertion checks that the sum of all the item element's totalAmount is equal to the value of the context node (which is the amount element) by using XPath's sum() function. The second assertion makes sure that all the different item's currency values match the currency value for the amount element. Note that the following (similar) assertion does not perform the same check:

<sch:assert test = "/purchaseOrder/items/item/totalAmount/@currency = @currency"
    >...</sch:assert>			

This assertion checks that at least one of the item's currency values matches the currency in the amount element. However, in this case we want to make sure that all the item's currency values match, and hence we negate both the assertion expression (using XPath's not() function) and the operator used inside the assertion ('=' becomes '!='). When writing Schematron rules this technique is often used to express the desired constraint.

Pages: 1, 2, 3, 4, 5

Next Pagearrow