Self-Enhancing Stylesheets
Developing new stylesheets can be a chore. It would be nice if you
could tell your stylesheet to trace which tags from the source document
are not yet processed by xsl:template elements. And why not
make your stylesheet write an xsl:template match skeleton for
each unhandled tag? Unfortunately, doing this was too hard with XSLT
1.0. But XSLT 2.0 will change this, and with help of Saxon 7.5 (or
greater) you can try it out now.
XSLT gives you two ways of processing XML documents. The first is to
directly access parts of the document by XPath expressions. This is what
the XSLT 2.0 Working Draft calls pull-processing (§ 2.4). The other
way is to walk through the document in document order. Letting the
document structure drive the processing sequence is called push
processing, and this is what the xsl:template match and
xsl:apply-templates mechanisms are for. Usually both kinds of
processing are mixed in a stylesheet. When one writes a new stylesheet to
process an unknown document, coding typically begins with adding
xsl:template match rules for the tags.
The Simple Approach
The step-by-step way of writing your templates is not a problem unless you have to work on large or deeply structured documents, containing many different tags. This was the problem I ran into when I was engaged in transforming the OpenOffice 1.0 file format. I wasn't in the mood for reading the extensive DTD to only pass some element contents to HTML. So I began to implement templates for some easy and self-explanatory tags:
<!-- process headers to h1 .. h6 by text level attribute-->
<xsl:template match="text:h">
<xsl:element name="{concat('h',@text:level)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<!-- generic para processing -->
<xsl:template match="text:p">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
When I asked myself which tags might have passed through my templates unrecognized, I recalled the XSLT default templates and added the following:
<xsl:template match="*">
<xsl:comment>
<xsl:value-of select="concat('not processed: ',name())"/>
</xsl:comment>
<xsl:apply-templates/>
</xsl:template>
Because XSLT's behavior in generically processing a tag that has no better fitting template definition, this was extremely simple. It gave me a trace of all tags not processed by my more specific match attributes. If all you want is to have a log of unhandled tags in your output document, you're done with this solution.
Improving the Solution
The idea of letting the stylesheet write the names of the unhandled
tags into a separate document is the next obvious step. We will make it
write out not just comments, but <xsl:template
match...> fragments that match the bypassed tags, and inform us
about all of their attributes. And we want to have this code as a
stylesheet module that can easily be plugged into any stylesheet we are
currently working on.
It is very hard to achieve all this with XSLT 1.0. At a minimum you will have to use processor specific extensions. For that reason, the following solution requires a XSLT 2.0 Processor. The most advanced experimental implementation is Michael Kay's Saxon 7 processor. The version used with these examples was 7.5.1.
In the following we will solve the problems that derive from our requirements step by step. You can find the complete code samples in the self_enhancing_samples.zip download. The basic XML document is named glossary.xml. The main stylesheet which is in construction is new_sheet.xslt. To keep things simple, it creates a small HTML file and contains only one template to handle a tag from glossary.xml. It includes the nursery_sheet.xslt, where the tracing work is done.
Figure 1 shows the data flow of the described processing.

Figure 1. data flow
between affected documents and stylesheets.
The main stylesheet (new_sheet.xslt) looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:include href="nursery_sheet.xslt"/>
<xsl:output method="html" version="1.0"/>
<xsl:template match="/">
<xsl:call-template name="tag-trace"/>
<html>
<head><title>trace tags</title></head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="entry">
<h3><xsl:value-of select="term"/></h3>
</xsl:template>
</xsl:stylesheet>
That's all that's needed inside new_sheet.xslt. The
nursery_sheet.xslt is included, and somewhere later the
<xsl:call-template name="tag-trace"/> starts
tracing. It's worth noting that we do not care about namespaces yet. If we
work on documents containing namespace declarations, they should be
defined in this sheet. This is usually done inside the
<xsl:stylesheet> tag.
Each time we run this transformation the unhandled tags are processed
by the tag-trace template contained in
nursery_sheet.xslt, which we will look at now.
Writing Multiple Output Documents
The idea of XSLT 1.0 was to transform a single input tree into a single
output tree. There was no mechanism to write to multiple output files.
Most XSLT processors implemented their own specific solutions for
this. The new XSLT 2.0 xsl:result-document element cleans up
the jungle of processor specific tags and allows you to serialize an
arbitrary number of output trees to separate documents. If you are
curious about the details you may consider looking at XSLT
2.0 and XQuery 1.0 Serialization and §20 of XSLT 2.0 . Here we
only take a glance at this topic.
The output is controlled by the xsl:output element, which
as in XSLT 1.0 remains optional. But if you intend to use multiple output
formats there must be multiple xsl:output elements in your
stylesheet. An output definition comes into effect when referenced from an
xsl:result-document block. Let's have a look at how it
works. First we define a named output format as a top level element.
<xsl:output method="xml" name="nursery"
standalone="yes" indent="yes"/>
At some other place in the stylesheet where serialization begins we
refer to the definition using the format attribute inside the
xsl:result-document element.
<xsl:result-document href="not_processed.xml" format="nursery">
Obviously, the href attribute tells the name of the
document where the result of the serialization should be written to. But
there is one thing to remember. We are not able to do file processing in
XSLT 2.0 like we can do in most programming languages. What we are doing
is tree serializing. This means that we can't use the simple
<xsl:template match=“*“>, to wrap the
content creation with an <xsl:result-document>
element. Such a template would be triggered during document processing
while we are busy constructing the primary result tree, which is
serialized to the target (HTML) document. So we are forced to disconnect
the tracing mechanism from the recursive descendant processing of the main
input document.
Analyzing the Stylesheet
What we need to do is to read the current state of our main stylesheet and compare it with the tags found in the input document. So we have to handle two input documents. The main input document, which is one of the input parameters, and the stylesheet we are working on.
To analyze which tags have been handled already, we read all
match attributes of xsl:template elements and
hold them as a list of tag names in a variable. This can be achieved with
the document() function.
<xsl:variable name="handled-tags">
<xsl:for-each select="document($analyze)//xsl:template/@match">
<xsl:value-of select="."/>
<xsl:if test="position() != last()">, </xsl:if>
</xsl:for-each>
</xsl:variable>
If we want to keep the tags handled by xsl:value-of as
well, we can easily add the following inside the variable definition.
<xsl:for-each select="document($analyze)//xsl:value-of/@select">
<xsl:value-of select="."/>
<xsl:if test="position() != last()">, </xsl:if>
</xsl:for-each>
The result is a comma-separated list of tagnames handled by
xsl:template or xsl:value-of statements.
If we had decided to note the tag-trace template into the
new_sheet.xslt, we could have used document('') to
get the root node of the current stylesheet. But we want to separate it
from current work, which is why we have to pass the name of our working
stylesheet to the document() function with the
$analyze parameter.
Pages: 1, 2 |