Using Stylesheet Schemas
April 6, 2005
Last month I promised to eventually discuss the use of schemas with XSLT 2.0 — that is, XSLT 2.0's ability to read a W3C schema to discover additional information about a source tree, result tree, or interim temporary tree, and to use that information when processing a document. This month I'll talk about the use of schemas with XSLT, but not schemas for the documents you're processing. Schemas for the stylesheets themselves, when those available are a good fit for your tools, can add a lot to your XSLT development. (While I'm on the topic, though, it's great to see that one new addition to the 11 February XSLT 2.0 Working Draft is that "A non-schema-aware processor now allows all the built-in types defined in XML Schema to be used; previously only a subset of the primitive types plus xs:integer were permitted." This will allow even more type-aware XSLT processing without requiring the use of a W3C schema.)
A DTD for Stylesheets?
The XSLT 1.0 Recommendation included an appendix with a non-normative DTD Fragment for XSLT Stylesheets. It's non-normative because namespaces play such an important role in XSLT stylesheets, and DTDs don't understand namespaces; it's a fragment because several extra declarations are necessary to allow for the use of literal result elements.
Literal result elements are the elements in a stylesheet from outside of the XSLT
namespace that an XSLT processor will add to the result document just the way they
are. For
example, in the following template rule, the h2
element is a literal result
element that will be wrapped around the contents of each subtitle
element from
the source tree that gets added to result tree:
<xsl:template match="subtitle"> <h2><xsl:apply-templates/></h2> </xsl:template>
It's easy enough for a DTD's xsl:template
element declaration to
list the elements such as xsl:apply-templates
, xsl:choose
, and
xsl:element
that are allowed inside of an xsl:template
element,
but a DTD has no way to say "and any other elements from outside of the
http://www.w3.org/1999/XSL/Transform namespace." The appendix mentioned above describes
some
contortions that use parameter entity redefinition to allow this, but it's enough
trouble
that I've never heard of anyone doing it for a production environment. One simpler
alternative, which has been used in production environments, is to avoid all use of
literal
result elements and to use the xsl:element
element to insert any new elements
into the result tree. Using this approach, the template rule above might be written
like
this:
<xsl:template match="subtitle"> <xsl:element name="h2"> <xsl:apply-templates/> </element> </xsl:template>
This use of the xsl:element
instead of literal result elements
allowed the use of a DTD-driven editor to edit stylesheets. It also allowed the addition
of
a valuable quality-control step to a system responsible for the maintenance of a large
number of stylesheets, because validating each edited stylesheet before checking it
into the
repository greatly reduced the possibility of the runtime system choking on a bad
stylesheet.
Stylesheet quality control and the use of intelligent editors are still worth pursuing beyond DTDs. An important reason that XSLT stylesheets are XML documents is to let us take advantage of our favorite XML tools on the stylesheets themselves. Just because DTDs aren't a good fit with these goals, though, we don't have to give up.
Schematron for Stylesheets?
I've written before about how Schematron lets us fill some of the gaps of a DTD-driven system. While it may not help us edit XSLT stylesheets with most popular XML editors, it can help the quality control goal of keeping certain mistakes out of stylesheets. When a co-worker asked me about the possibility of doing this, I replied that it would be a good idea, but that he'd still have some Schematron rules to write and test. Then I remembered: buried in the distribution of nxml, an Emacs mode that uses RELAX NG Compact schemas to turn Emacs into a context-sensitive XML editor, is a RELAX NG schema for XSLT. (It's a RELAX NG Compact version created from a non-compact version with trang.) So, I told my co-worker that instead of writing a set of Schematron rules, he could take advantage of work already done by James Clark.
RELAX NG for Stylesheets
Among Clark's many other contributions to the XML world, he helped to invent
XSLT, so he knows the syntax pretty well. He didn't write up some content models based
on
the DTD fragment and his own knowledge of XSLT, though; according to a header comment
in his
xslt.rng schema file, it "was mostly generated from the syntax summary in the XSLT
Recommendation." Being generated right from the spec itself (I assume that, by "syntax
summary," he meant the p
elements with a class
value of
element-syntax
in the XSLT 1.0 Recommendation) automates the process enough
that we can assume that nothing was missed.
It's ironic that I had forgotten about this schema, because I had been using it all along. I've used Emacs with nxml to edit XSLT stylesheets and other XML documents as long as nxml has been available, but I never had to configure it to use the xslt.rnc schema when editing files with an ".xsl" extension because that's its default behavior.
If you don't know RELAX NG syntax, you can still use Emacs with nxml to edit
documents based on your DTDs by using trang to convert your DTDs to RELAX NG Compact
versions and pointing the nxml mode at those. If you're more familiar with XSLT than
with
RELAX NG Compact syntax, xslt.rnc is a great way to learn about RELAX NG Compact syntax.
For
example, to see how it addresses the issue of allowing certain XSLT elements and literal
result elements from other namespaces in the content of an xsl:template
element, see how it declares this element:
element template { extension.atts, attribute match { pattern.datatype }?, attribute name { qname.datatype }?, attribute priority { number.datatype }?, attribute mode { qname.datatype }?, (param.element*, template.model) }
First, note how it's declaring an element called "template" and not "xsl:template". Because RELAX NG is namespace-aware, you can assign any namespace prefix you want to the http://www.w3.org/1999/XSL/Transform namespace and use that in your stylesheets. (Use of the XSLT 1.0 DTD fragment requires you to hardcode a prefix such as "xsl:" in the element declaration and then use that for all "documents" — in this case, stylesheets — that you check against that DTD.)
The template.model
pattern referenced in the
template.element
declaration is declared near the beginning of the
schema:
template.model = (instruction.category | literal-result-element | text)*
The instruction.category
pattern names all the XSLT elements that
a template rule can contain, and the literal-result-element
pattern has this
declaration:
literal-result-element = element * - xsl:* { literal-result-element.atts, template.model }
It shows the declaration of an element with any name, as long as it's outside of the namespace assigned to the "xsl" prefix in this stylesheet. Along with its attributes, it can contain anything that can go into a template rule.
Norm Walsh has created an alternative version of
the RELAX NG XSLT schema for the creation and editing of XSLT 2.0 stylesheets. Norm
had a
puzzle to solve along the way, though. He wondered what he could do to have nxml
automatically use the original schema for 1.0 stylesheets when he edited a stylesheet
whose
xsl:stylesheet
element had a version
attribute value of "1.0"
and the schema for 2.0 stylesheets when he used nxml to edit an XSLT 2.0 stylesheet.
The
solution turned out to be an elegant bit of RELAX NG syntax that he hadn't used before;
read
about what he did on his
weblog.
W3C Schema for Stylesheets
There are several W3C Schemas for XSLT 1.0 out there, so I decided to try a few. My first step with each was to use the Xerces Java ASBuilder utility, which I wrote about in the O'Reilly book XML Hacks, to check the integrity of each schema. If it couldn't parse the schema, there wasn't much point in trying to validate a stylesheet against that schema. (To use ASBuilder to check a W3C schema's integrity, make sure that xercesSamples.jar is in your classpath, add the -f option to the command line, and don't include the -i option if you're not adding a document to validate against the schema.)
The "XSLT v1.1 XSD Schema for Visual Studio.NET" available on gotdotnet.com failed this first test, but according to a Kathleen Dollard weblog
entry it works with Microsoft's Visual Studio editor. (The gotdotnet.com web page
describes it as an XSLT v1.0 schema, and the only 1.1-like feature I saw was the
msxsl:script
element, which I suppose corresponds to the
xsl:script
element in the aborted 1.1 version of XSLT. This undeclared
"msxsl" prefix was one of the things that Xerces choked on.) I didn't see anything
in this
schema's declaration for the xsl:template
element that looked like it would
allow literal result elements, and without a copy of Visual Studio, I couldn't test
my
hypothesis that this schema would not allow them to be used.
When I heard of an XSLT 1.0 schema from webMethods, I couldn't find it on their web site, but I did find a copy on the web site for Austria's University of Klagenfurt. It was written before W3C Schemas became a Recommendation, and after I tried changing its namespace URL from http://www.w3.org/2000/10/XMLSchema to the http://www.w3.org/2001/XMLSchema URL specified by the Recommendation, the Xerces ASBuilder utility complained enough about it to convince me that it wasn't a robust option for checking stylesheet integrity. I had a similar experience with an xslt.xsd schema developed by Don Box in early April 2000.
A RELAX NG devotee would say "You have a RELAX NG schema that works and you
need a W3C schema version of the same schema? Just use trang to create one!" trang
couldn't convert the original xslt.rng schema to xslt.xsd because of a nested grammar,
so I
took out the definition of and reference to the top-level-extension
pattern.
trang then converted the schema with a few warnings, but no errors. Before the ASBuilder
utility approved of the xsd version, I had to change some quotation marks that were
part of
an attribute value into "
entity references. Then, Xerces parsed a
stylesheet that included literal result elements against this schema with no complaints,
and
it gave the right error message when I added an illegal xsl:whatever
element to
the stylesheet.
Also in Transforming XML |
|
I don't know exactly what I gave up by removing the
top-level-extension
pattern, but I would look further into it before using
the W3C schema created from xslt.rng in a production environment. I would be more
likely to
just use xslt.rng and a RELAX NG validator in the production environment, but if you
really
need to use an XSLT schema with a tool that doesn't know about RELAX NG, a W3C XSLT
schema
created from a RELAX NG one is one option. (Make sure to mention to the tool's developers
that full RELAX NG support would be easier to implement than full W3C Schema support.)
There's one more option: the XSL Working Group has made an XSLT 2.0 W3C schema available
on the W3C's web site. It's non-normative, and XSLT 2.0 is not quite finished, but
ASBuilder
had no complaints with the schema, Xerces had no problem with an error-free XSLT 1.0
stylesheet that included literal result elements when I checked it against this schema,
and
Xerces found my errant xsl:whatever
element in a stylesheet when I parsed it
against this schema.
Because XSLT 2.0 is mostly backward-compatible with 1.0, using the W3C's 2.0 schema to edit and validate your XSLT 1.0 stylesheets is better than using an editor that's completely unaware of XSLT stylesheet structure. It would be an interesting project for someone who wanted to learn a lot about XSLT 2.0 and W3C Schema to revise this schema to truly reflect XSLT 1.0 structure. Myself, I'll stick with James' RELAX NG schema for XSLT 1.0 stylesheets and Norm's RELAX NG schema for XSLT 2.0 stylesheets.