Schematron Query Language Binding and XSLT

October 17, 2022

qlb xslt schematron

Schematron's Query Language Binding is a little-known and underused feature of the language. Erik Siegel gives an introduction to its use.

1 Introduction
2 Query language binding in general
3 Using XSLT keys in Schematron
4 Using XSLT functions in Schematron
5 Wrap up

1 Introduction

Schematron is one of the XML validation languages. It’s a flexible and elegant language that allows you to specify rules for XML documents and the messages to emit when these rules are broken. An introduction to Schematron can for instance be found in the xml.com article Validating XML with Schematron.

Any Schematron schema contains expressions. For instance for the nodes to match, the conditions to check, to insert values from the document (in messages and several other places). It should come as no surprise that in the vast majority of cases XPath is used as the language for these expressions. But what most people are probably not aware of is that Schematron is actually a container language around the language used for expressions. And that, theoretically, you could use other expression languages inside Schematron.

This concept is called Query Language Binding or QLB. Query Language Binding allows you to specify the embedded programming language used for all expressions. And some bindings, most notably the XSLT ones, also allow you to add specific code constructs, greatly expanding the scope of what you can do.

This article discusses Query Language Binding in general and then elaborates on the capabilities the XSLT type bindings provide you with.

Query Language Binding is just one of many Schematron features. All of them are described in my book about Schematron: Schematron - A language for Validating XML, XML Press, 2022. This article is an excerpt from the chapter on Query Language Binding.

2 Query language binding in general

The Query Language Binding for a Schematron schema is set using a queryBinding attribute, containing the name of the binding, on the root element. For instance, to set the Query Language Binding to xslt3:

A Schematron schema using the queryBinding attribute to set the Query Language Binding

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt3"> 
  …
</schema>

The Schematron standard reserves a number of Query Language Binding names: exslt, stx, xslt, xslt2, xslt3, xpath, xpath2, xpath3, xpath31, xquery, xquery3 and xquery31. The ones set in bold are defined. The other ones are reserved only, but by their name we can surmise what was meant.

Despite this seemingly abundant number of bindings, for the most prevailing Schematron processors the xslt, xslt2 and xslt3 bindings can be used only. So let’s focus on those:

The xslt Query Language Binding (which is the default if you don’t specify a queryBinding attribute) allows you to use XPath 1.0 expressions. Additionally it allows indexes using the xsl:key element. Applied properly this can make lookups of, for instance, identifiers significantly faster.
The xslt2 Query Language Binding is an extension of the xslt one. It allows XPath 2.0 expressions. This gives you a lot more options for your expressions and also more standard functions. Results of expressions are no longer limited to strings but can be any data type.

An important additional feature is that it allows you to define your own functions in your schema, using xsl:function and embedded XSLT 2.0. These functions can then be used in expressions in your schema.
The xslt3 Query Language Binding is an extension of the xslt2 binding. It allows the use of XPath 3.1 expressions and functions expressed in XSLT 3.0.

My advice would be, if your processor supports this, to always set the Query Language Binding to either xslt2 or, preferably, xslt3. Not specifying a binding (by not using a root queryBinding attribute) means that your limited to XPath 1.0 for your expressions. Given the current state of technology that’s severely limiting.

3 Using XSLT keys in Schematron

One of the things you can do with an xslt type Query Language Binding is use XSLT keys. Let’s explore this.

Referencing in XML documents is often done using identifiers. For instance the following example contains orders that reference items, by identifier:

Example of an XML document that contains references using identifiers.

<?xml version="1.0" encoding="UTF-8"?>
<orders>
  <item id="bolts" price="5.49">A box with 20 bolts</item>
  <item id="nuts" price="3.78">A box with 20 nuts</item>
  <!-- … many, many more items… -->
  <order>
    <ordered-item id-ref="bolts" quantity="5"/>
    <ordered-item id-ref="nuts" quantity="10"/>
  </order>
  <!-- … many, many more orders… -->
</orders>

The value of each id-ref attribute on an ordered-item element must contain the identifier of an item element, in the same document. A basic version of a Schematron schema that checks this is:

Schematron schema that checks the identifier references for Figure 2.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt3"> 
  <pattern>
    <rule context="ordered-item">
      <let name="item-id" value="@id-ref"/>
      <assert test="exists(/*/item[@id eq $item-id])">
        The referenced item <value-of select="$item-id"/> does not exist
      </assert>
    </rule>
  </pattern>
</schema>

The let element stores the identifier to check in the variable $item-id. This is used in the assert to check whether an item element with the same identifier exists. Very straightforward and perfectly all right.

But what if the document is very large and contains thousands and thousands of item elements? Every ordered-item element causes the schema processor to search all the item elements, from top to bottom, again and again. That’s not very efficient and can take a long time.

A solution to this is creating a key. This is an in-memory data structure that allows fast lookup of elements by some key index value. XSLT has an instruction for this, xsl:key. Using either an xslt2 or xslt3 Query Language Binding we can use this in Schematron also. The following Schematron schema does the same as Figure 3, but much more efficient:

Schematron schema that checks the identifier references for Figure 2 with a key.

<?xml version="1.0" encoding="UTF-8"?>
<!-- 1 - Define the XSLT namespace: -->
<schema xmlns="http://purl.oclc.org/dsdl/schematron"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" queryBinding="xslt3">  
  
  <!-- 2 - Define a key using the XSLT key instruction: -->
  <xsl:key name="item-ids" match="/*/item" use="@id"/>  
  
  <pattern>
    <rule context="ordered-item">
      <!-- 3 - Reference the key using the key() function: -->
      <assert test="exists(key('item-ids', @id-ref))">
        The referenced item <value-of select="@id-ref"/> does not exist
      </assert>
    </rule>
  </pattern>
</schema>

To be able to use instructions from XSLT, we need to declare the XSLT namespace. Hence the xmlns:xsl="http://www.w3.org/1999/XSL/Transform" namespace declaration on the root element. Every element that starts with xsl: is now considered an XSLT instruction.
The XSLT xsl:key instruction defines a key. It has three components:
- The name of the key, in this case item-ids.
- The nodes the key is about, in this case the /*/item elements.
- The value of the key, in this case the identifier of the item, contained in its id attribute.
What happens under the hood is that the Schematron processor creates some appropriate data structure that allows fast lookup of item elements using the value of their id attribute.
The match attribute of the assert element uses the (XSLT) key() function to look up values in the key. This function takes two or three parameters:
- The name of the key, in this case item-ids (as a string, therefore written using quotes, as 'item-ids').
- The value to lookup, in this case the id-ref attribute of the ordered-item element.
- The third, optional and unused here, parameter of the key() function allows you to limit the returned nodes to a specific part of the document (a “subtree”). This by specifying the root node of the part you’re interested in. Default value is the document node /.
The key() function will perform a fast and efficient lookup and return the item element(s) associated with the given identifier. If the identifier is unknown it will return an empty sequence.

A warning before we end this topic: keys don’t come for free. Building a key takes time and you have to weigh this against the time raw lookups take (as done in Figure 3). In general, don’t use keys on small documents. The tipping point is fuzzy. If this is important to you: experiment and measure!

4 Using XSLT functions in Schematron

Separating code using functions is a very normal thing to do when programming. Schematron itself however lacks the ability to define functions. For this it relies on its Query Language Binding feature.

As an example, assume we have some separate reference document that tells us the expected price for something with a certain type. It also contains a default price, as an attribute on the root element, for everything with a type not mentioned otherwise:

A list with type codes and prices

<type-codes-and-prices default-price="10.0">
  <data type="A125" price="17.25"/>
  <data type="X96" price="89.34"/>
</type-codes-and-prices>

We would like to use this reference document in checking documents like the following:

Data containing type codes and prices

<things>
  <thing name="thing 1" type="A125" price="17.25"/>
  <thing name="thing 2" type="A125" price="17.26"/>
  <thing name="thing 3" type="X96" price="89.34"/>
  <thing name="thing 4" type="Y78" price="10.01"/>
</things>

To check a price of a thing in Figure 6, we need to look it up its expected price in Figure 5, based on its type. If it’s not mentioned we should use the default price. We could express this as a complicated and rather long XPath expression directly in Schematron, but it’s much nicer and more maintainable to define a function for this using XSLT. Using a Query Language Binding of xslt2 or xslt3 we can do this:

Schematron schema that checks the prices in Figure 6 against Figure 5

<?xml version="1.0" encoding="UTF-8"?>
<!-- 1 - Define the XSLT namespace on the root element: -->
<schema xmlns="http://purl.oclc.org/dsdl/schematron"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" queryBinding="xslt3">

  <!-- 2 - Define a namespace for the functions as an <ns> element: -->
  <ns uri="#functions" prefix="f"/>

  <!-- 3 - Define your function using XSLT: -->
  <xsl:function name="f:get-price" as="xs:double">
    <xsl:param name="type" as="xs:string"/>
    <xsl:variable name="prices-document" as="document-node()"
      select="doc('type-codes-and-prices.xml')"/>
    <xsl:variable name="data-element-for-type" as="element(data)?"
      select="$prices-document//data[@type eq $type]"/>
    <xsl:choose>
      <xsl:when test="exists($data-element-for-type)">
        <xsl:sequence select="xs:double($data-element-for-type/@price)"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:sequence 
          select="xs:double($prices-document/type-codes-and-prices/@default-price)"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

  <pattern>
    <rule context="thing">
      <!-- 4 - Use the defined function to get the price: -->
      <let name="expected-price" value="f:get-price(@type)"/>
      <assert test="$expected-price eq xs:double(@price)">
        The price for <value-of select="@name"/> should be 
        <value-of select="$expected-price"/>
      </assert>
    </rule>
  </pattern>

</schema>

We’re going to use XSLT code as part of, embedded in, Schematron. Therefore you have to define the XSLT namespace on the root element (xmlns:xsl="http://www.w3.org/1999/XSL/Transform").
XPath functions names must be in some namespace. In Schematron you have to define such a namespace as an ns element. This allows you to use this namespace in the XPath expressions in the schema. The example namespace (#functions) and prefix (f) used here are random examples. You can use anything you like.
Define your function(s) using the XSLT programming language. In this example the function is called f:get-price.
We use the defined f:get-price() function to get the expected price from Figure 5 and use this in the assert’s test expression.

5 Wrap up

Query Language Binding allows you, theoretically, to change the language used for expressions in a Schematron schema.
In most cases only the xslt, xslt2 and xslt3 bindings are supported.
You specify the Query Language Binding of a Schematron schema using the queryBinding attribute on the root element of the schema.
If you don’t specify this attribute the default value is xslt.
The xslt binding limits you to XPath 1.0 expressions only, which, given the current state of technology, is rather limiting.
The xslt2 and xslt3 bindings allow you to use XSLT keys and functions. These are very useful constructs in more complex schemas.

Schematron Query Language Binding and XSLT

Table of Contents