XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.



August 06, 2003


The EXSLT project specifies a set of standard extension functions for XSLT that, when implemented by all vendors of XSLT processors, will allow writing portable XSLT applications. Until now most of the major XSLT processors had already some form of support for EXSLT, with the notable exception of MSXML.

This article describes a third party implementation of EXSLT for MSXML4 by Dimitre Novatchev. The author had no access to internal product interfaces and had to overcome some serious difficulties, which until now had prevented the development of any such third party implementation of EXSLT for MSXML4.

How to extend an XSLT processor

There are different possible ways to implement extensions to an XSLT processor:

  • Modify the implementation of the XSLT processor, recompile, rebuild, test and redeploy it.
  • Implement one or more extension elements.
  • Provide a library of inline extension functions.
  • Provide external extension objects, whose methods will be referenced as extension functions.

The first and the second option require that either the source code of the XSLT processor (and the right to modify and extend it) is available or that documentation is provided explaining how to implement extension elements. None of these is true in the case of MSXML.

Following the third option will require the MSXML programmers to include some inline scripting code written in a language like Javascript. This is not completely convenient and differs from the way extension functions are implemented and used in other XSLT processors.

The decision to choose the last option reflects its advantages:

  • A namespace prefix is associated with an extension object.
  • The code of this object is not inlined in the transformation and in fact the XSLT programmer may not know anything about it.
  • This is the way to reference and use extension functions in most XSLT processors.

What EXSLT modules to implement

There are eight different modules of functions specified by EXSLT:

  • Common. Covers common, basic extension elements and functions.
  • Dates and Times. Covers common, basic extension elements and functions.
  • Dynamic. Covers extension elements and functions that deal with the dynamic evaluation of strings containing XPath expressions.
  • Functions. Extension elements and functions that allow users to define their own functions for use in expressions and patterns in XSLT.
  • Math. Covers extension elements and functions that provide facilities to do with maths.
  • Regular Expressions. Covers extension elements and functions that provide facilities to do with regular expressions.
  • Sets. Covers those extension elements and functions that provide facilities to do with set manipulation.
  • Strings Covers extension elements and functions that provide facilities to do with string manipulation.

Dynamic and Functions require access to the internals (code/data structures) of the XSLT processor -- something not achievable in the case of MSXML. Math is too trivial and has already an implementation in XSLT 1.0 and XSLT 2.0 (See The FXSL Functional Programming Library for XSLT1 and The FXSL Functional Programming Library for XSLT2 Functions, Dates and Times, Regular expressions and part of Strings will be covered by the standard XSLT 2.0 (see XQuery 1.0 and XPath 2.0 Functions and Operators.) There are also pure XSLT 1.0 libraries covering dates and time (A date_time XSLT 1.0 template library available as part of XSelerator)

From the standpoint of immediate usability in XSLT 1.0, the most useful EXSLT function is common:node-set(). Other necessary and useful functions, which cannot be implemented in XSLT 1.0, are those from the Sets module.

So I decided to implement the following EXSLT functions:

  • common:node-set()
    And all functions from the Sets module:
  • set:intersection()
  • set:difference()
  • set:distinct()
  • set:leading()
  • set:trailing()
  • set:has-same-node()

The Big Problem

Have you ever wondered why for more than two years there has been no attempt at a third-party implementation of EXSLT for MSXML? Try to produce such and you'll know that there is a big obstacle, which no one until now had been able to remove.

In the object model of MSXML there is one method for obtaining a node set as result of evaluating an XPath expression. This method is selectNodes() member of the IXMLDOMNode object and defined as follows:

HRESULT selectNodes( BSTR expression, IXMLDOMNodeList ** resultList);

As can be seen, in the MSXML object model a node set is represented by an IXMLDOMNodeList object. A node is represented by an IXMLDOMNode object.

selectNodes() can be issued only against a "current node" -- some IXMLDOMNode object.

The problem is that there is no documented way to create an IXMLDOMNodeList, except as returned by selectNodes(). This means that one cannot perform even such simple tasks as getting the union of two IXMLDOMNode nodes.


How then can we get a subset of a node set using only the MSXML object model? Impossible. But this is exactly what is needed in order to implement any of the six functions in the EXSLT Sets module.

I must confess that it was exactly the challenge of the impossible task that attracted me. Someone, who didn't know better, told me that implementing the Sets module was impossible without asking Microsoft to provide a more powerful interface, containing methods that can create an IXMLDOMNodeList from any collection of IXMLDOMNode objects. It was also implied that an XSLT specialist was inferior in attacking, solving and even understanding this problem.

What happened next is described below.

The Solution: Steal an IXMLDOMNodeList

As explained above, using the MSXML4 object model it is impossible to create an IXMLDOMNodeList other than one returned by the selectNodes() method and this has strong limitations making impossible the implementation of any EXSLT Sets functions.

The solution is to try to create an IXMLDOMNodeList outside of the MSXML object model. In XSLT there are no such limitations. It is straightforward to obtain the result of evaluating any XPath expression. So why not perform an XSLT transformation, which will evaluate any Xpath expression we need and produce its result to us?

A nice idea, but there is a major flaw in it -- an XSLT transformation always produces copies of the original nodes, not the nodes themselves. This is probably the moment when anybody stopped in desperation.

A transformation can evaluate any XPath expression internally and have access to the resulting node set, but it cannot "pass it back", it can only produce copies of the original nodes.

Can a transformation pass the result node set to any piece of code at all? Yes, it can pass it to another template it calls or instantiates or to an extension function.

This seems absolutely unusable in our case. We called the transformation so how it can call us? Even if this were possible, we still must make a return and will lose the valuable node set that the transformation passed to us.

The answer is simple: we store it in a property of our extension object. When the transformation returns to our code that started it, the node set will still be the value of this property.

This manipulation is reflected in the following picture:

Figure 1: How to create a desired, new IXMLDOMNodeList

Initial algorithms

Having solved the big problem, it's time for the complete implementation. The XSLT solutions in the "Aux. Transform" box can be really simple and compact. Thus for set:intersection() we can have:

<xsl:stylesheet version="1.0"
  <xsl:param name="ns1" select="/.."/>
  <xsl:param name="ns2" select="/.."/>
  <xsl:variable name="vCnt" select="count($ns2)"/>

  <xsl:template match="intersect">
     select="caller:storeXPathResult($ns1[count(. | $ns2) = $vCnt])"/>

For set:distinct() we can have the following XSLT implementation:

<xsl:stylesheet version="1.0"

  <xsl:param name="ns1" select="/.."/>
  <xsl:template match="makeDistinct" name="makeDistinct">
    <xsl:param name="pDistinct" select="/.."/>
    <xsl:param name="pNodes" select="$ns1"/>

      <xsl:when test="$pNodes">
        <xsl:variable name="pnewDistinct"
   select="$pDistinct | $pNodes[1]"/>
         <xsl:call-template name="makeDistinct">
           <xsl:with-param name="pDistinct" select="$pnewDistinct"/>
           <xsl:with-param name="pNodes"
select="$pNodes[position() > 1][not(. = $pnewDistinct)]"/>
        <xsl:value-of select="caller:storeXPathResult($pDistinct)"/>

Finally, for set:leading() we could have:

<xsl:stylesheet version="1.0" 
 xmlns:xsl = "http://www.w3.org/1999/XSL/Transform" 
<xsl:param name="ns1" select="/.."/>
 <xsl:param name="ns2" select="/.."/>
  <xsl:template match="leading" name="leading">
    <xsl:param name="pNodes" select="$ns1"/>
    <xsl:param name="pANode" select="$ns2[1]"/>
    <xsl:param name="pLeading" select="/.."/>
      <xsl:when test="not($pNodes) or not($pANode)">
        <xsl:value-of select="caller:storeXPathResult($pNodes)"/>
      <xsl:when test="count($pANode | $pNodes[1]) = 1
                or count($pANode | $pNodes) != count($pNodes)">
        <xsl:value-of select="caller:storeXPathResult($pLeading)"/>
        <xsl:variable name="pnewLeading"
       select="$pLeading | $pNodes[1]"/>
         <xsl:call-template name="leading">
           <xsl:with-param name="pLeading"
           <xsl:with-param name="pNodes"
          select="$pNodes[position() > 1]"/>
           <xsl:with-param name="pANode" select="$pANode"/>

The XSLT implementation of the other functions of the Sets module is similar -- set:difference() is coded in a similar way to set:intersection(), set:trailing() is similar to set:leading(), and set:has-same-node() returns true if set:intersection() is non-empty.

Pages: 1, 2, 3

Next Pagearrow