Menu

Using Stylesheet Schemas

April 6, 2005

Bob DuCharme

Last month I promised to eventually discuss the use of schemas with XSLT 2.0 — that is, XSLT 2.0's ability to read a W3C schema to discover additional information about a source tree, result tree, or interim temporary tree, and to use that information when processing a document. This month I'll talk about the use of schemas with XSLT, but not schemas for the documents you're processing. Schemas for the stylesheets themselves, when those available are a good fit for your tools, can add a lot to your XSLT development. (While I'm on the topic, though, it's great to see that one new addition to the 11 February XSLT 2.0 Working Draft is that "A non-schema-aware processor now allows all the built-in types defined in XML Schema to be used; previously only a subset of the primitive types plus xs:integer were permitted." This will allow even more type-aware XSLT processing without requiring the use of a W3C schema.)

A DTD for Stylesheets?

The XSLT 1.0 Recommendation included an appendix with a non-normative DTD Fragment for XSLT Stylesheets. It's non-normative because namespaces play such an important role in XSLT stylesheets, and DTDs don't understand namespaces; it's a fragment because several extra declarations are necessary to allow for the use of literal result elements.

Literal result elements are the elements in a stylesheet from outside of the XSLT namespace that an XSLT processor will add to the result document just the way they are. For example, in the following template rule, the h2 element is a literal result element that will be wrapped around the contents of each subtitle element from the source tree that gets added to result tree:

<xsl:template match="subtitle">
  <h2><xsl:apply-templates/></h2>
</xsl:template>

It's easy enough for a DTD's xsl:template element declaration to list the elements such as xsl:apply-templates, xsl:choose, and xsl:element that are allowed inside of an xsl:template element, but a DTD has no way to say "and any other elements from outside of the http://www.w3.org/1999/XSL/Transform namespace." The appendix mentioned above describes some contortions that use parameter entity redefinition to allow this, but it's enough trouble that I've never heard of anyone doing it for a production environment. One simpler alternative, which has been used in production environments, is to avoid all use of literal result elements and to use the xsl:element element to insert any new elements into the result tree. Using this approach, the template rule above might be written like this:

<xsl:template match="subtitle">
  <xsl:element name="h2">
    <xsl:apply-templates/>
  </element>
</xsl:template>

This use of the xsl:element instead of literal result elements allowed the use of a DTD-driven editor to edit stylesheets. It also allowed the addition of a valuable quality-control step to a system responsible for the maintenance of a large number of stylesheets, because validating each edited stylesheet before checking it into the repository greatly reduced the possibility of the runtime system choking on a bad stylesheet.

Stylesheet quality control and the use of intelligent editors are still worth pursuing beyond DTDs. An important reason that XSLT stylesheets are XML documents is to let us take advantage of our favorite XML tools on the stylesheets themselves. Just because DTDs aren't a good fit with these goals, though, we don't have to give up.

Schematron for Stylesheets?

I've written before about how Schematron lets us fill some of the gaps of a DTD-driven system. While it may not help us edit XSLT stylesheets with most popular XML editors, it can help the quality control goal of keeping certain mistakes out of stylesheets. When a co-worker asked me about the possibility of doing this, I replied that it would be a good idea, but that he'd still have some Schematron rules to write and test. Then I remembered: buried in the distribution of nxml, an Emacs mode that uses RELAX NG Compact schemas to turn Emacs into a context-sensitive XML editor, is a RELAX NG schema for XSLT. (It's a RELAX NG Compact version created from a non-compact version with trang.) So, I told my co-worker that instead of writing a set of Schematron rules, he could take advantage of work already done by James Clark.

RELAX NG for Stylesheets

Among Clark's many other contributions to the XML world, he helped to invent XSLT, so he knows the syntax pretty well. He didn't write up some content models based on the DTD fragment and his own knowledge of XSLT, though; according to a header comment in his xslt.rng schema file, it "was mostly generated from the syntax summary in the XSLT Recommendation." Being generated right from the spec itself (I assume that, by "syntax summary," he meant the p elements with a class value of element-syntax in the XSLT 1.0 Recommendation) automates the process enough that we can assume that nothing was missed.

It's ironic that I had forgotten about this schema, because I had been using it all along. I've used Emacs with nxml to edit XSLT stylesheets and other XML documents as long as nxml has been available, but I never had to configure it to use the xslt.rnc schema when editing files with an ".xsl" extension because that's its default behavior.

If you don't know RELAX NG syntax, you can still use Emacs with nxml to edit documents based on your DTDs by using trang to convert your DTDs to RELAX NG Compact versions and pointing the nxml mode at those. If you're more familiar with XSLT than with RELAX NG Compact syntax, xslt.rnc is a great way to learn about RELAX NG Compact syntax. For example, to see how it addresses the issue of allowing certain XSLT elements and literal result elements from other namespaces in the content of an xsl:template element, see how it declares this element:

  element template {
    extension.atts,
    attribute match { pattern.datatype }?,
    attribute name { qname.datatype }?,
    attribute priority { number.datatype }?,
    attribute mode { qname.datatype }?,
    (param.element*, template.model)
  }

First, note how it's declaring an element called "template" and not "xsl:template". Because RELAX NG is namespace-aware, you can assign any namespace prefix you want to the http://www.w3.org/1999/XSL/Transform namespace and use that in your stylesheets. (Use of the XSLT 1.0 DTD fragment requires you to hardcode a prefix such as "xsl:" in the element declaration and then use that for all "documents" — in this case, stylesheets — that you check against that DTD.)

The template.model pattern referenced in the template.element declaration is declared near the beginning of the schema:

template.model = 
  (instruction.category | literal-result-element | text)*

The instruction.category pattern names all the XSLT elements that a template rule can contain, and the literal-result-element pattern has this declaration:

literal-result-element =
  element * - xsl:* { literal-result-element.atts, template.model }

It shows the declaration of an element with any name, as long as it's outside of the namespace assigned to the "xsl" prefix in this stylesheet. Along with its attributes, it can contain anything that can go into a template rule.

Norm Walsh has created an alternative version of the RELAX NG XSLT schema for the creation and editing of XSLT 2.0 stylesheets. Norm had a puzzle to solve along the way, though. He wondered what he could do to have nxml automatically use the original schema for 1.0 stylesheets when he edited a stylesheet whose xsl:stylesheet element had a version attribute value of "1.0" and the schema for 2.0 stylesheets when he used nxml to edit an XSLT 2.0 stylesheet. The solution turned out to be an elegant bit of RELAX NG syntax that he hadn't used before; read about what he did on his weblog.

W3C Schema for Stylesheets

There are several W3C Schemas for XSLT 1.0 out there, so I decided to try a few. My first step with each was to use the Xerces Java ASBuilder utility, which I wrote about in the O'Reilly book XML Hacks, to check the integrity of each schema. If it couldn't parse the schema, there wasn't much point in trying to validate a stylesheet against that schema. (To use ASBuilder to check a W3C schema's integrity, make sure that xercesSamples.jar is in your classpath, add the -f option to the command line, and don't include the -i option if you're not adding a document to validate against the schema.)

The "XSLT v1.1 XSD Schema for Visual Studio.NET" available on gotdotnet.com failed this first test, but according to a Kathleen Dollard weblog entry it works with Microsoft's Visual Studio editor. (The gotdotnet.com web page describes it as an XSLT v1.0 schema, and the only 1.1-like feature I saw was the msxsl:script element, which I suppose corresponds to the xsl:script element in the aborted 1.1 version of XSLT. This undeclared "msxsl" prefix was one of the things that Xerces choked on.) I didn't see anything in this schema's declaration for the xsl:template element that looked like it would allow literal result elements, and without a copy of Visual Studio, I couldn't test my hypothesis that this schema would not allow them to be used.

When I heard of an XSLT 1.0 schema from webMethods, I couldn't find it on their web site, but I did find a copy on the web site for Austria's University of Klagenfurt. It was written before W3C Schemas became a Recommendation, and after I tried changing its namespace URL from http://www.w3.org/2000/10/XMLSchema to the http://www.w3.org/2001/XMLSchema URL specified by the Recommendation, the Xerces ASBuilder utility complained enough about it to convince me that it wasn't a robust option for checking stylesheet integrity. I had a similar experience with an xslt.xsd schema developed by Don Box in early April 2000.

A RELAX NG devotee would say "You have a RELAX NG schema that works and you need a W3C schema version of the same schema? Just use trang to create one!" trang couldn't convert the original xslt.rng schema to xslt.xsd because of a nested grammar, so I took out the definition of and reference to the top-level-extension pattern. trang then converted the schema with a few warnings, but no errors. Before the ASBuilder utility approved of the xsd version, I had to change some quotation marks that were part of an attribute value into &quot; entity references. Then, Xerces parsed a stylesheet that included literal result elements against this schema with no complaints, and it gave the right error message when I added an illegal xsl:whatever element to the stylesheet.

    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

I don't know exactly what I gave up by removing the top-level-extension pattern, but I would look further into it before using the W3C schema created from xslt.rng in a production environment. I would be more likely to just use xslt.rng and a RELAX NG validator in the production environment, but if you really need to use an XSLT schema with a tool that doesn't know about RELAX NG, a W3C XSLT schema created from a RELAX NG one is one option. (Make sure to mention to the tool's developers that full RELAX NG support would be easier to implement than full W3C Schema support.)

There's one more option: the XSL Working Group has made an XSLT 2.0 W3C schema available on the W3C's web site. It's non-normative, and XSLT 2.0 is not quite finished, but ASBuilder had no complaints with the schema, Xerces had no problem with an error-free XSLT 1.0 stylesheet that included literal result elements when I checked it against this schema, and Xerces found my errant xsl:whatever element in a stylesheet when I parsed it against this schema.

Because XSLT 2.0 is mostly backward-compatible with 1.0, using the W3C's 2.0 schema to edit and validate your XSLT 1.0 stylesheets is better than using an editor that's completely unaware of XSLT stylesheet structure. It would be an interesting project for someone who wanted to learn a lot about XSLT 2.0 and W3C Schema to revise this schema to truly reflect XSLT 1.0 structure. Myself, I'll stick with James' RELAX NG schema for XSLT 1.0 stylesheets and Norm's RELAX NG schema for XSLT 2.0 stylesheets.