Schematron Query Language Binding and XSLT
October 17, 2022
Table of Contents
- 1 Introduction
- 2 Query language binding in general
- 3 Using XSLT keys in Schematron
- 4 Using XSLT functions in Schematron
- 5 Wrap up
Schematron is one of the XML validation languages. It’s a flexible and elegant language that allows you to specify rules for XML documents and the messages to emit when these rules are broken. An introduction to Schematron can for instance be found in the xml.com article Validating XML with Schematron.
Any Schematron schema contains expressions. For instance for the nodes to match, the conditions to check, to insert values from the document (in messages and several other places). It should come as no surprise that in the vast majority of cases XPath is used as the language for these expressions. But what most people are probably not aware of is that Schematron is actually a container language around the language used for expressions. And that, theoretically, you could use other expression languages inside Schematron.
This concept is called Query Language Binding or QLB. Query Language Binding allows you to specify the embedded programming language used for all expressions. And some bindings, most notably the XSLT ones, also allow you to add specific code constructs, greatly expanding the scope of what you can do.
This article discusses Query Language Binding in general and then elaborates on the capabilities the XSLT type bindings provide you with.
Query Language Binding is just one of many Schematron features. All of them are described in my book about Schematron: Schematron - A language for Validating XML, XML Press, 2022. This article is an excerpt from the chapter on Query Language Binding.
The Query Language Binding for a Schematron schema is set using a
queryBinding attribute, containing the name of the binding, on the root
element. For instance, to set the Query Language Binding to
The Schematron standard reserves a number of Query Language Binding names:
The ones set in bold are defined. The other ones are reserved only, but by their name we can
surmise what was meant.
Despite this seemingly abundant number of bindings, for the most prevailing Schematron
xslt3 bindings can be
used only. So let’s focus on those:
xsltQuery Language Binding (which is the default if you don’t specify a
queryBindingattribute) allows you to use XPath 1.0 expressions. Additionally it allows indexes using the
xsl:keyelement. Applied properly this can make lookups of, for instance, identifiers significantly faster.
xslt2Query Language Binding is an extension of the
xsltone. It allows XPath 2.0 expressions. This gives you a lot more options for your expressions and also more standard functions. Results of expressions are no longer limited to strings but can be any data type.
An important additional feature is that it allows you to define your own functions in your schema, using
xsl:functionand embedded XSLT 2.0. These functions can then be used in expressions in your schema.
xslt3Query Language Binding is an extension of the
xslt2binding. It allows the use of XPath 3.1 expressions and functions expressed in XSLT 3.0.
My advice would be, if your processor supports this, to always set the Query Language
Binding to either
xslt2 or, preferably,
xslt3. Not specifying a
binding (by not using a root
queryBinding attribute) means that your limited to
XPath 1.0 for your expressions. Given the current state of technology that’s severely limiting.
One of the things you can do with an
xslt type Query Language Binding is use
XSLT keys. Let’s explore this.
Referencing in XML documents is often done using identifiers. For instance the following example contains orders that reference items, by identifier:
1 2 3 4 5 6 7 8 9 10 11
<?xml version="1.0" encoding="UTF-8"?> <orders> <item id="bolts" price="5.49">A box with 20 bolts</item> <item id="nuts" price="3.78">A box with 20 nuts</item> <!-- … many, many more items… --> <order> <ordered-item id-ref="bolts" quantity="5"/> <ordered-item id-ref="nuts" quantity="10"/> </order> <!-- … many, many more orders… --> </orders>
The value of each
id-ref attribute on an
ordered-item element must contain the identifier of an
in the same document. A basic version of a Schematron schema that checks this is:
1 2 3 4 5 6 7 8 9 10 11
<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt3"> <pattern> <rule context="ordered-item"> <let name="item-id" value="@id-ref"/> <assert test="exists(/*/item[@id eq $item-id])"> The referenced item <value-of select="$item-id"/> does not exist </assert> </rule> </pattern> </schema>
let element stores the identifier to check in the variable
$item-id. This is used in the
assert to check whether an
item element with the same identifier exists. Very straightforward and perfectly
But what if the document is very large and contains thousands and
item elements? Every
ordered-item element causes the schema
processor to search all the
item elements, from top to bottom, again and again.
That’s not very efficient and can take a long time.
A solution to this is creating a key. This is an in-memory data
structure that allows fast lookup of elements by some key index value. XSLT has an instruction
xsl:key. Using either an
Language Binding we can use this in Schematron also. The following Schematron schema does the
same as Figure 3, but much more efficient:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
<?xml version="1.0" encoding="UTF-8"?> <!-- 1 - Define the XSLT namespace: --> <schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" queryBinding="xslt3"> <!-- 2 - Define a key using the XSLT key instruction: --> <xsl:key name="item-ids" match="/*/item" use="@id"/> <pattern> <rule context="ordered-item"> <!-- 3 - Reference the key using the key() function: --> <assert test="exists(key('item-ids', @id-ref))"> The referenced item <value-of select="@id-ref"/> does not exist </assert> </rule> </pattern> </schema>
To be able to use instructions from XSLT, we need to declare the XSLT namespace. Hence the
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"namespace declaration on the root element. Every element that starts with
xsl:is now considered an XSLT instruction.
xsl:keyinstruction defines a key. It has three components:
The name of the key, in this case
The nodes the key is about, in this case the
The value of the key, in this case the identifier of the item, contained in its
What happens under the hood is that the Schematron processor creates some appropriate data structure that allows fast lookup of
itemelements using the value of their
matchattribute of the
assertelement uses the (XSLT)
key()function to look up values in the key. This function takes two or three parameters:
The name of the key, in this case
item-ids(as a string, therefore written using quotes, as
The value to lookup, in this case the
id-refattribute of the
The third, optional and unused here, parameter of the
key()function allows you to limit the returned nodes to a specific part of the document (a “subtree”). This by specifying the root node of the part you’re interested in. Default value is the document node
key()function will perform a fast and efficient lookup and return the
itemelement(s) associated with the given identifier. If the identifier is unknown it will return an empty sequence.
A warning before we end this topic: keys don’t come for free. Building a key takes time and you have to weigh this against the time raw lookups take (as done in Figure 3). In general, don’t use keys on small documents. The tipping point is fuzzy. If this is important to you: experiment and measure!
Separating code using functions is a very normal thing to do when programming. Schematron itself however lacks the ability to define functions. For this it relies on its Query Language Binding feature.
As an example, assume we have some separate reference document that tells us the expected price for something with a certain type. It also contains a default price, as an attribute on the root element, for everything with a type not mentioned otherwise:
We would like to use this reference document in checking documents like the following:
1 2 3 4 5 6
<things> <thing name="thing 1" type="A125" price="17.25"/> <thing name="thing 2" type="A125" price="17.26"/> <thing name="thing 3" type="X96" price="89.34"/> <thing name="thing 4" type="Y78" price="10.01"/> </things>
To check a price of a thing in Figure 6, we need to look it up its expected
price in Figure 5, based on its type. If it’s
not mentioned we should use the default price. We could express this as a complicated and
rather long XPath expression directly in Schematron, but it’s much nicer and more
maintainable to define a function for this using XSLT. Using a Query Language Binding of
xslt3 we can do this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
<?xml version="1.0" encoding="UTF-8"?> <!-- 1 - Define the XSLT namespace on the root element: --> <schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" queryBinding="xslt3"> <!-- 2 - Define a namespace for the functions as an <ns> element: --> <ns uri="#functions" prefix="f"/> <!-- 3 - Define your function using XSLT: --> <xsl:function name="f:get-price" as="xs:double"> <xsl:param name="type" as="xs:string"/> <xsl:variable name="prices-document" as="document-node()" select="doc('type-codes-and-prices.xml')"/> <xsl:variable name="data-element-for-type" as="element(data)?" select="$prices-document//data[@type eq $type]"/> <xsl:choose> <xsl:when test="exists($data-element-for-type)"> <xsl:sequence select="xs:double($data-element-for-type/@price)"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="xs:double($prices-document/type-codes-and-prices/@default-price)"/> </xsl:otherwise> </xsl:choose> </xsl:function> <pattern> <rule context="thing"> <!-- 4 - Use the defined function to get the price: --> <let name="expected-price" value="f:get-price(@type)"/> <assert test="$expected-price eq xs:double(@price)"> The price for <value-of select="@name"/> should be <value-of select="$expected-price"/> </assert> </rule> </pattern> </schema>
We’re going to use XSLT code as part of, embedded in, Schematron. Therefore you have to define the XSLT namespace on the root element (
XPath functions names must be in some namespace. In Schematron you have to define such a namespace as an
nselement. This allows you to use this namespace in the XPath expressions in the schema. The example namespace (
#functions) and prefix (
f) used here are random examples. You can use anything you like.
Define your function(s) using the XSLT programming language. In this example the function is called
We use the defined
f:get-price()function to get the expected price from Figure 5 and use this in the assert’s test expression.
Query Language Binding allows you, theoretically, to change the language used for expressions in a Schematron schema.
In most cases only the
xslt3bindings are supported.
You specify the Query Language Binding of a Schematron schema using the
queryBindingattribute on the root element of the schema.
If you don’t specify this attribute the default value is
xsltbinding limits you to XPath 1.0 expressions only, which, given the current state of technology, is rather limiting.
xslt3bindings allow you to use XSLT keys and functions. These are very useful constructs in more complex schemas.