XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

The Future of XSLT 2.0

March 21, 2007

I recently wrote a weblog entry about the directions that I saw with XML, and while it has proved to be fairly popular, it has also generated a fair number of comments that really deserve more detailed examination. One of these comments — and one that I've been planning to write about for a while anyway —l; has to do with my statement that XSLT 2.0 is increasingly being used as a "router" language, replacing such applications as Microsoft's BizTalk Server.

However, in the long run, as the world increasingly chooses XML as the preferred data transport story, other technologies, such as XSLT 2.0, will likely end up making most of it functionality redundant and useful at best on the edge cases. Most databases now produce XML, either directly, through specialized extensions, or via an XQuery layer. Given that databases are also increasingly being sequestered behind data abstraction layers, this also means that from the standpoint of an external application, such databases are simply another vector for supplying XML in a given format (and increasingly, for consuming XML being sent to them).

SQL does not articulate clearly the serialization of content. Database vendors use this to their advantage, wrapping access to such databases to their own internal APIs. And for the most part, even in cases such as MySQL, the primary serialization format is either an explicit API wrapper or a text output that is highly vendor-dependent.

This has resulted in a rather remarkable services industry built almost exclusively on "translating" between SQL output (or input) and some formal presentation layer. It can be argued that XSLT itself is simply another example of a translation layer; and, to be honest, if the language is used improperly, that statement is actually quite true.

But XSLT has a paradigm-altering function called document(), as well as another interesting capability called parameters. The document function can work on static XML content, but it can also use the GET protocol (through query strings) to retrieve content from web services. Parameters can be set from the hosting language to determine these web services invocations and, additionally, can be calculated from within the XSLT and passed in the same manner.

However, there are several problems with this approach. For starters, creating such query strings in the first place and passing them in is painful because you have to use the rather cumbersome call-template syntax, wrap the results in a variable, then pass the variable into the document() function. There are few checks to handle error conditions, and once you create the output, you can't necessarily use that output as the input to some other action. This is because the output is an XML fragment rather than an XML node.

Thus, while it has certainly been possible to use XSLT in this fashion, you have all too often been forced to rely upon inconsistently implemented extensions such as the node-set() function. Indeed, most of the really interesting things you could do with XSLT 1.0 came down to these self-same extensions, which again raised questions regarding whether there really was much benefit in using XSLT 1.0 in the first place.

Important New Features

However, much if not most of this concern evaporates with XSLT 2.0, which I believe has significantly advanced the state of the art. Here are some of the new features:

  • xsl:function. This element makes it possible to create XSLT functions that can then be placed in special namespaces and invoked from within XPath expressions, which makes it far easier to modularize XSLT functionality in order to turn XSLT into a formal "programming language."

  • Formal XPath extension mechanism. XSLT now has a formal (and consistent) means of invoking methods written in other languages from within an XSLT expression.

  • unparsed-text() and unparsed-text-available(). The unparsed-text-available() method solves one of the biggest problems of working with the document() — dealing with situations where the URL is unable to retrieve content — by checking a URL to insure that it is in fact capable of retrieving something.

  • unparsed-text(). This solves another problem: loading non-XML content into an XSLT transformation. This works on any content, including any binary or textual data. SOAP web services pass a fair amount of information in the headers, and this content should be passable as a bundle to the transformation; this means that XSLT can in fact be used to process these.

  • Sequences. One of the reasons why XSLT 2.0 took so long to get out the door was the limitations of XPath. It turns out not to be possible to legally create an internal node-set() function. After considerable effort, what emerged was the decision to support general sequences of objects that could be either atomic data types or XML objects. This change enabled more sophisticated groupings: set operations (union, intersection, difference) and collapsing lists, including numeric iterations.

  • Numeric iterations. You can now use expressions such as (1 to 10) that will return increasing iterative values, reducing the need for recursive expressions dramatically and consequently simplifying the code base for any number of different operations.

  • Regular expressions. Both XSLT 2.0 and XPath 2.0 contain support for regular expressions, as well as a number of string functions for taking advantage of regexes. For instance, the tokenize() function can split a string into a sequence based on a regular expression (or straight text), making it much easier to split apart lines and fields in CSV files, extract data from irregular phone number formats, perform actions if two words are within a given number of characters of one another, and so forth. This also makes it generally possible to use XSLTs for general schema validation, and gives a considerable leg up in the generation of rich Schematron output.

  • result-document and output. This element makes it possible to send content (and not necessarily just XML content) to a file or web service, independent of the final output mechanism used by the transformation itself. The two limitations that result-document faces are the fact that these are asynchronous POST events, and that you can control only a very limited number of HTTP headers (depending upon the implementation).

  • Inline control keywords. XSLT 2.0 now supports a number of XQuery extensions (not the entire set, but a fair number) for doing things, such as iterating with a for loop or performing various actions based upon conditional statements, directly within XPath. This can reduce file sizes considerably, and generally makes for code that is somewhat easier to read.

  • Character maps. With XSLT 2.0, you can now create character maps that let you map certain character sequences to some output form. Character maps replace the rather cumbersome (and often poorly used) disable-output-escaping to insure that specific entities (such as the less-than "<" symbol) stay preserved properly in output. This actually proves very useful for creating intermediate XML structures that can nonetheless be processed through other XSLT calls, and even more, is useful for generating output files that resemble XML but are not quite identical (such as jsp pages, which might have inline <% %> elements), such as:

<jsp:setProperty name="user" property="id" value="<%= "id" + idValue %>"></jsp:setProperty>

Pages: 1, 2

Next Pagearrow