XSLT UK 2001 Report
April 8th and 9th 2001 saw the first conference dedicated to XSLT take place at Keble College in Oxford. While the basis of the conference was XSLT, this didn't stop people talking about the XSL effort in general or about other vocabularies and technologies that work with or against XSLT.
Opening Address
The conference was opened by Norm Walsh from Sun Microsystems, member of the XSL Working Group and maintainer of one of the more complex XSL applications -- the DocBook XSL family, which he talked about later in the day. Norm set the scene for the conference, reminding us of the origins of XSLT and outlining four requirements that will make XSLT and XPath as ubiquitous as XML has become:
- interoperable tools,
- cooperative specs,
- optimizations or compilations of stylesheets, and
- information set pipelines.
XSLT and the Art of Motorcycle Maintenance
Next up was David Carlisle, from NAG Ltd., one of the editors of MathML and an XSL-List regular. David gave another view of XSLT's heritage, as a functional programming language fitting into the same development path as Scheme or DSSSL. He outlined the benefits of taking a functional approach to presenting information, especially with web-based content, where random access means that you need something that allows you to process only parts of the content and still work reliably (for example, in numbering pages without having to process each page to construct the number). David had the title for his talk thrust upon him, but he still managed to bring in a reference to the seminal book "Zen and the Art of Motorcycle Maintenance" with a quote.
After a while he says, "Can I have a motorcycle when I get old enough?"
"If you take care of it."
"What do you have to do?"
"Lots of things. You've been watching me."
"Will you show me all of them?"
"Sure."
"Is it hard?"
"Not if you have the right attitudes. It's having the right attitudes that's hard."
"Oh."
After a while I see he is sitting down again. Then he says, "Dad?"
"What?"
"Will I have the right attitudes?"
"I think so," I say. "I don't think that will be any problem at all."
And so we ride on and on, down through Ukiah, and Hopland, and Cloverdale, down into the wine country...
Beginners can find XSLT difficult to deal with, especially when they come from a procedural languages background. But XSLT isn't hard if you have the right attitude.
XSLT Design Patterns
I spoke next, representing only myself and drawing on my experience answering questions on XSL-List. I outlined some of the design patterns that have emerged in the use of XSLT. Using examples from an application I worked on for Xi advise bv as an example, I spoke about four levels of design patterns.
- application level
- combining stylesheets and using XSLT within a wider context -- I specifically talked about getting multiple views of the same data using XSLT
- stylesheet level
- the flow of processing within the application -- I talked about the differences between push and pull, and how to combine them, and about grouping by position, in hierarchies and by value (using the Muenchian Method)
- template level
- patterns in instructions such as Wendell Piez's method for repetition and David Allouche's method for normalizing strings
- XPath level
- expressions for getting unique nodes, for set manipulation and for conditional XPaths, such as Oliver Becker's method
Throughout, I talked about the way that identifying these methods can help us to identify the areas where XSLT and XPath need to be developed.
XSLT Performance
We were then treated to a talk by Mike Kay that highlighted the experiences of implementers. Now at Software AG, he is a member of the XSL Working Group and another regular contributor on XSL-List, but he's probably most well known as the implementer of the Saxon XSLT processor and the author of the XSLT Programmer's Reference.
Mike spoke about XSLT performance. Kay advised that you only need to worry about the performance of XSLT processors or stylesheets if you have business requirements that require a certain throughput or response time, although you might also be concerned about the predictability, tuneability, or scalability of a particular stylesheet.
While he didn't specifically talk about Saxon, Mike showed the basic way an XSLT processor works: taking the XML stylesheet, turning it into a tree, 'compiling' that tree, similarly taking the XML source and turning that into a tree, and then constructing the result tree (theoretically in memory, but often practically outputting it immediately).
Mike described the most important things for XSLT processor efficiency: tight code, name management, XPath queries, XSLT pattern matching, pipelining, and the storage of node sets. He discussed the issues involved in constructing a node tree for XPath/XSLT processing, especially given its differences from the DOM. (XPath node trees don't include CDATA or entity nodes, and there is different handling of whitespace.) He also outlined the Tiny Tree Model that he now uses in Saxon (after seeing a similar technique in Xalan), where transient objects are created from arrays as required. This gives real advantages, allowing run-time decisions about the kinds of access paths that should be stored (for example, you only need to store information about what a node's parent is if you need to access a node's parent).
The areas for future optimization that implementers have barely touched yet are
- parallel execution, which should be possible as XSLT is side-effect free
- compilation of stylesheets into byte code, something picked up by Morten Jørgensen in the next talk
- global optimization of processing flow, as opposed to local optimization of XPaths
- serial transformations, if it's possible to detect those (parts of) transformations that don't require access to the entire tree
- exploiting XML schemas
There were some tips for users too:
- follow good performance engineering practice: record the time a stylesheet takes before and after making each change, and change it back if it doesn't improve
- use small documents rather than large ones
- don't assume that the processor makes a particular optimization
- minimize the number of visits to each node
- use variables
- use temporary trees (result tree fragments in XSLT 1.0)
- use keys
- don't use
xsl:number - don't care about the changes that can only give less than 10% improvement
The XSLT Compiler for JVM
Morten Jørgensen, from Sun Microsystems, introduced the XSLT Compiler (XSLTC). XSLTC creates "translets": Java classes that run about 30-200% faster than interpretive XSLT processors and are usually about a quarter of the size of an XSLT processor and stylesheet. Because of their size and platform independence, these translets can run on virtually anything, including handheld machines.
With XSLTC, stylesheets can be compiled into translet bundles, each one of which contains a main class and a set of auxiliary classes for elements that require special handling. These are shipped with an XSLT runtime library, containing a tailored DOM with SAX interfaces for input and output.
For authors using XSLTC, Morten outlined a few tips. The main body of a translet is a switch statement, which each case being a particular match pattern. Authors should therefore keep match patterns simple and, in particular, avoid unioned match patterns. At an application level, developers should take advantage of the cacheability of the DOMs used by XSLTC as XML parsing can take as much as 50% of the total processing time.
XSLTC is still alpha software, but the only outstanding features
needed for conformance with XSLT 1.0 are support for simplified
stylesheets (where the document element of the stylesheet is not
xsl:stylesheet), the namespace axis, and
id() and key() functions within match
patterns.