XML.com 
 Published on XML.com http://www.xml.com/pub/a/2007/03/21/the-future-of-xslt-20.html
See this if you're having trouble printing code examples

 

The Future of XSLT 2.0
By Kurt Cagle
March 21, 2007

I recently wrote a weblog entry about the directions that I saw with XML, and while it has proved to be fairly popular, it has also generated a fair number of comments that really deserve more detailed examination. One of these comments — and one that I've been planning to write about for a while anyway —l; has to do with my statement that XSLT 2.0 is increasingly being used as a "router" language, replacing such applications as Microsoft's BizTalk Server.

However, in the long run, as the world increasingly chooses XML as the preferred data transport story, other technologies, such as XSLT 2.0, will likely end up making most of it functionality redundant and useful at best on the edge cases. Most databases now produce XML, either directly, through specialized extensions, or via an XQuery layer. Given that databases are also increasingly being sequestered behind data abstraction layers, this also means that from the standpoint of an external application, such databases are simply another vector for supplying XML in a given format (and increasingly, for consuming XML being sent to them).

SQL does not articulate clearly the serialization of content. Database vendors use this to their advantage, wrapping access to such databases to their own internal APIs. And for the most part, even in cases such as MySQL, the primary serialization format is either an explicit API wrapper or a text output that is highly vendor-dependent.

This has resulted in a rather remarkable services industry built almost exclusively on "translating" between SQL output (or input) and some formal presentation layer. It can be argued that XSLT itself is simply another example of a translation layer; and, to be honest, if the language is used improperly, that statement is actually quite true.

But XSLT has a paradigm-altering function called document(), as well as another interesting capability called parameters. The document function can work on static XML content, but it can also use the GET protocol (through query strings) to retrieve content from web services. Parameters can be set from the hosting language to determine these web services invocations and, additionally, can be calculated from within the XSLT and passed in the same manner.

However, there are several problems with this approach. For starters, creating such query strings in the first place and passing them in is painful because you have to use the rather cumbersome call-template syntax, wrap the results in a variable, then pass the variable into the document() function. There are few checks to handle error conditions, and once you create the output, you can't necessarily use that output as the input to some other action. This is because the output is an XML fragment rather than an XML node.

Thus, while it has certainly been possible to use XSLT in this fashion, you have all too often been forced to rely upon inconsistently implemented extensions such as the node-set() function. Indeed, most of the really interesting things you could do with XSLT 1.0 came down to these self-same extensions, which again raised questions regarding whether there really was much benefit in using XSLT 1.0 in the first place.

Important New Features

However, much if not most of this concern evaporates with XSLT 2.0, which I believe has significantly advanced the state of the art. Here are some of the new features:

<jsp:setProperty name="user" property="id" value="<%= "id" + idValue %>"></jsp:setProperty>

I think this feature is a useful one, because it effectively opens up XSLT to the world of generating processing logic in most web server languages.

XSLT 2.0 as Router

An XML message enters a system to be processed in some manner. One of the more fundamental distinctions in programming models on the Web has to do with the question of where intent is located; that is, where does the responsibility for indicating what should be done with a message reside. In REST mode the intent resides solely within the URL: the message itself contains the associated data, but doesn't by itself contain the relevant processing intent. In RPC mode, on the other hand, the responsibility for processing resides primarily within the envelope (typically a SOAP message), which may also include parameters, all of which are intended to invoke a method in some other language such as Java or C#.

Now, suppose, for a moment, that you created an XSLT2 transformation and bound it to one or more external objects under appropriate namespaces. It is not generally possible to invoke an XSLT transformation that is created by another transformation in one pass, regardless of the version (nor should it be, for security reasons). However, it is certainly possible to create a dual pass system, the first of which constructs from the incoming message one or more transformations that invoke the appropriate external class method calls in standalone mode, which then in turn either pass the results to the output stream or generate secondary streams that pass newly created XML to different places.

There are several benefits to this approach. First, what you are passing is initially XML, to be transformed as XML. This means that while you are doing this transformation you can also apply tests that will determine whether the incoming data is not only well-formed but business valid, and that protects the system from potentially serious attacks. If Schematron is also generating XML, then you can send messages back up the pipe (if such communication exists) to outline in a user friendly form what has gone wrong. Second, it also makes it easier to stop potentially expensive server operations from being invoked if such calls exceed some parameter (such as the rate of automated requests coming from a given client). Since you're processing the information as XML, you're not putting the integrity of your system at risk for insertion attacks.

The first process also makes it possible to create batches consisting of multiple jobs: once a given job (a command call, for instance) or some additional processing is made, then the job is removed from this stack, and the XSLT can then choose to process the remainder of the job stack based upon some conditional expression emerging from the previous job's processing. In other words, XSLT at that point acts as a job-control language, based not only upon the incoming data but also upon the results of processing that data.

Such invocations could (and generally should) be done asynchronously. The first XSLT passes the initially processed XML to a second asynchronous transformation using result-document, which would then be retrieved as a message from a set of queued messages. In this particular case, the effective routing could be done solely within the first XSLT, with little need to create multiple synchronous chains of transformations. This system uses XSLT as a message router. That it can also serve as a validation system is not accidental. One of the powers of XML is that you can check for the validity of XML without the danger of instantiating the object in live form.

XSLT 2.0, XQuery, and XForms

Over the years I've written a great deal about XQuery. I have to admit that I was not originally very taken with it and saw it as being a somewhat awkward replacement for XSLT. One possibility that I thought about was the idea of using XQuery to retrieve content that could be transformed by XSLT into the appropriate format. Until I saw the eXist database, I assumed that these would occur in separate processes. With eXist, however, you can make an XQuery call to the transform:transform() extensions to invoke a transformation on an XML node. With a quick download and a little fiddling with some of the configuration files, you can switch from the default Xalan transformer to Saxon 8.9, enabling XSLT2, XPath2, and XQuery all in the same system. And then you can turn your XQueries directly into web services.

This makes for an incredible combination, in part because you never leave the XML context. You don't have to spend a large amount of time writing different configurations of transformer objects, XML Document resources, pipes, parameters, or the like. You basically end up working with XML pretty much throughout the entire process. The combination of this with the ability to work with the various server objects (request, response, session, etc.) essentially gives you the entire application context in a single XQuery program.

This becomes especially important when working with XForms. I find it increasingly difficult not to work with XForms, to be honest, even given some of the complexities involved in different implementations. With XForms, you can build the data model XML on the client side, send it up to an XQuery that will validate and process it, and then this object can in turn be passed off to a transformation to generate another XForms instance, an XHTML report, or an SVG chart of some sort. XSLT2 works well in building such input templates, again giving you fine-grain conditional control and the establishment of interface capabilities. I think the "X" model — XQuery + XSLT2 + XHTML + XForms — will likely prove a potent one in the future. It has already gaining the attraction of various industries, especially in the medical, insurance, government, and education sectors.

The Future of XSLT

Predicting the future of any technology, especially one as esoteric as XSLT, is an exercise fraught with risk. XSTL 2.0 is generally easier to learn than its predecessor, is considerably more powerful, makes most of the right moves with regard to extensibility, and already has some first-class implementations in place. Microsoft recently announced that it will be producing an XSLT 2.0 processor, and I wouldn't be surprised if other XSLT implementation owners aren't at least evaluating the option. It does what a good second version is supposed to do, in that it solves most of the problems of the first version without introducing a whole raft of new ones.

Overall, I see it becoming far more heavily used within the next couple of years as implementations proliferate, especially when you consider that XSLT 1.0 implementations now exist for very nearly every platform in use today, making it a remarkably successful cross-platform solution. As someone who has wrestled with run-away recursive stacks, clunky called template invocations, and implementation headaches, it couldn't come soon enough.

XML.com Copyright © 1998-2006 O'Reilly Media, Inc.