An Interview with Michael Kay
July 7, 2004
Michael Kay is the author of Wrox press's "XSLT Programmer's Reference," the standard reference work on XSLT, and the editor of the W3C's XSLT 2.0 specification, which is currently in Working Draft status. His Java-based Saxon XSLT processor is one of the most successful and popular XSLT processors in the language's history. The branch of Saxon supporting XSLT 1.0 is currently at release 6.5.3, and regular readers of this column will know that the 7.x branch of Saxon has been implementing more and more support for XSLT 2.0.
Michael has recently upgraded the 7.x branch to version 8.0, which is split into two versions: the free, open-source basic version known as Saxon-B and the commercial, schema-aware version known as Saxon-SA.
Michael has also recently founded his own company, Saxonica, to develop and market Saxon-SA. I discussed his new venture with him via email.
Bob DuCharme: As of today, are Saxon 7 and 8 still the only XSLT processors with any XSLT 2.0 support?
Michael Kay: Essentially, yes. Oracle has a beta release with support for a few XSLT 2.0 features, but it's very far from complete. A few other people have said, either officially or unofficially, that they are working on it, but they've not shown anything in public yet.
BD: Do you know of any use of Saxon 7 in production environments yet, even though XSLT 2.0 is still a Working Draft?
MK: One of the oddities of the open-source world is that I don't know very much about what my users are doing. I know that there are around 250 downloads of Saxon a day, of which around half are the 7.x version, but I have very little idea who is downloading it and what they are doing with it (if anything). Most of the feedback I get is either from a small group of experts who know the technology inside out and are stretching its boundaries, or from beginners who don't know where to start. There's a silent majority in between that I never hear from.
I think you need to distinguish two kinds of production environments for XSLT. There's the continuously running mission-critical web site, and there's the publishing shop that does a lot of ad-hoc one-off jobs. The impression I get is that a lot of people are using Saxon 7.x extensively for the second kind of production workload, but that most people with the first kind of environment are (quite rightly) sticking to Saxon 6.x (and XSLT 1.0) for the time being.
BD: What does the Saxonica version of Saxon offer above and beyond the features of the free version?
MK: For the moment, there is one difference: Saxon-SA, the commercial version, is schema-aware. The XSLT 2.0 specification itself identifies two conformance levels, a basic processor and a schema-aware processor, and I'm aiming to align the open-source product with the basic conformance level, and the commercial product with the schema-aware level. I expect there will be a similar distinction in XQuery as well, although the current working draft doesn't define conformance levels.
Being schema-aware means that a stylesheet (or query) can declare what type of input document it is designed to process, and what type of output document it is designed to produce. The main result is that you get better diagnostics when you get your code wrong.
Another benefit, which you start to realize when you are dealing with the more complex XML vocabularies, is that it becomes easier to write generic (or reusable) code that can process different elements with the same characteristics: as a very simple example, you can write a single template rule to process all date-valued attributes in the same way.
Part of the rationale for schema-aware XSLT and XQuery processing is that it should be possible to do more powerful optimizations, and therefore to get improved performance. For Saxon though, that's future potential rather than a reality today.
BD: Because XSLT processors need an XML parser to read the stylesheet and source document, versions of Saxon that support the XSLT 1.0 Recommendation had the Ælfred XML parser included as the default parser. To enable schema-aware XSLT processing, what parser does Saxon-SA use?
MK: Saxon continues to work with any SAX2 parser. It doesn't rely on the XML parser to do schema validation -- it does that itself.
A free bonus that comes with Saxon-SA is that it includes a brand-new schema processor. It's an unfortunate fact that the XML Schema specification is extremely complex (and buggy). As a result there aren't very many implementations, and they don't always give the same answers in edge cases.
Many users have taken to validating documents (and schemas) with more than one processor, to give added confidence when the document is valid, or to get better diagnostics when it isn't. I think that increasing the choice of schema processors that's available is something the community will find valuable in itself.
BD: Are there specific XSLT and XPath functions just for use with schema-related processing?
MK: When an XML document gets validated against a schema, the result isn't just a pass or fail: every element and attribute gets labeled with the schema-defined type that it validated against.
So you will have elements and attributes labeled as strings, integers, or dates, or as instances of user-defined types such as geographic coordinates, postal addresses, or taxpayer reference numbers. In a schema-aware stylesheet or query, you can write functions to process objects of a particular type, just as you would in Java or C#: the schema becomes the type system of the language. And you basically get the same benefits -- many programming errors are picked up sooner, which gives you a faster debugging turnaround, which means you can deliver working code more quickly.
Also in Transforming XML
At the coding level, you can declare the argument types of your variables, templates, and functions, and you can write path expressions and match patterns that select nodes according to their schema type. That means, for example, that you can select "all inline elements" in an XHTML document, without having to list all the elements that are classified as inline elements. Apart from anything else, that makes your code more resilient to changes in the schema.
The other important feature is that you can ask for your result documents to be validated against a schema. In Saxon, the validation is done on the fly, so instead of getting an error message at the end of the run that says the transformation or query was successful but the output wasn't valid against the schema, you get a failure as soon as you try to write an invalid element or attribute to the output. The error message points straight to the offending place in the stylesheet or query. I've been quite startled myself to see how effectively this works.
BD: Where have you seen early interest in using XSLT and W3C schemas together (for example, specific industries or development communities)?
MK: I can't quantify the level of interest. But I've certainly heard from quite a few individuals who are excited by the prospect. I don't think that the community as a whole will really catch on to the benefits, or discover what a different experience it is to write schema-aware queries and transforms, until they actually try it out and see for themselves.
BD: Will the schema-awareness help people who have been using the free version of Saxon 7 for XQuery work?
MK: The schema-aware features work equally from XSLT or XQuery. At the moment, I don't get the impression anyone is using XQuery in anger -- people are playing with it to learn about it, not to do real work. But a lot of people coming to XML from the data side rather than the document side see XQuery as the future, so there's an important community to be served there.
BD: What are your plans for a Saxon-SA beta program?
MK: One of the challenges ahead is to see how much I can adapt the things that work well in an open-source world to a more conventional, commercial software model. I've never much liked the concept of beta releases. I work by producing new releases every two or three months, each of which aims to be fit for production use, and if it falls short of that then I follow it up with a maintenance release after two or three weeks. So long as the W3C specs themselves are still moving, users will want the product to keep moving too. Once the specs have stabilized, I shall probably do what I did with 6.5.x, and freeze a version for people who want stability.
BD: When do you foresee Saxon-SA 1.0 being ready?
MK: The code is finished, tested, documented, and sitting on the shelf waiting to go out: I just have to sort out a few details of the logistics and the commercial side (the bankers have to approve my licensing terms, for example). With luck, you'll be able to get an evaluation copy by the time this interview is published.
BD: Where should people go to find out more?
Information about subscribing to the saxon-announce mailing list, where you can find out about new developments in the free and commercial versions of Saxon, and the saxon-help list, where you can address questions about your use of Saxon to Michael and other members of the Saxon community, is at http://sourceforge.net/mail/?group_id=29872.