EXSLT for MSXML
by Dimitre Novatchev
|
Pages: 1, 2, 3
Optimization
We could stop here, but the algorithms are not fast and robust, especially in some worst case scenarios. In particular, deep recursion typically can eat up significant amounts of memory, slow down the execution from linear to exponential, and eventually crash an XSLT processor. (See Use recursion effectively in XSL and Two-stage recursive algorithms in XSLT).
Therefore, in looking to improve our initial algorithms we try to limit the depth of recursion or to eliminate recursion completely.
Limiting recursion -- the Divide and Conquer (DVC) approach
One powerful way of limiting the depth of recursion is the Divide and Conquer (DVC) method (see Algorithms in C++, Part 1-4, by Robert Sedgewick, Addison-Wesley (ISBN 0-201-35088-2) and DVC Algorithms in XSLT). The idea is to split the input into two or more parts, apply the algorithm on each part and then combine the results. A DVC-style algorithm has a maximum recursion depth of Log2(N), which allows very long inputs to be processed without any stack overflow problems -- e.g. the maximum recursion depth for processing a list of 1000000 (1M) elements is only 19.
A DVC-style set:distinct() implementation looks like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:set="http://exslt.org/sets" xmlns:caller="urn:my-object"> <xsl:param name="ns1" select="/.."/> <xsl:template match="distinct"> <xsl:param name="pNodes" select="$ns1"/> <xsl:if test="$pNodes"> <xsl:choose> <xsl:when test="not($pNodes[2])"> <xsl:value-of select="caller:storeXPathResult($pNodes)"/> </xsl:when> <xsl:otherwise> <xsl:variable name="vHalfLng" select="floor(count($pNodes) div 2)"/> <xsl:variable name="vHalf1" select="$pNodes[position() <= $vHalfLng]"/> <xsl:variable name="vHalf2" select="$pNodes[position() > $vHalfLng]"/> <xsl:variable name="vDist1" select="set:distinct($vHalf1)"/> <xsl:variable name="vDist2" select="set:distinct($vHalf2)"/> <xsl:value-of select="caller:storeXPathResult($vDist1 | $vDist2 [not(. = $vDist1)])"/> </xsl:otherwise> </xsl:choose> </xsl:if> </xsl:template> </xsl:stylesheet> |
With longer input node sets this transformation is much faster than the previous and it scales well in processing huge inputs.
Binary search
The initial algorithm for set:leading() incrementally
unifies all nodes from the start of the node set until the designated
(second argument) node is reached. This has all the problems of deep
recursion.
A more direct and efficient solution exists, one based on the classic binary search algorithm. Here we try to find a subset of the original node set, such that the designated (second argument) node is the last node in this subset. When such a subset is found, we simply return:
$accumulatedLeading | thisSubset[position() != last()] |
The $accumulatedLeading is the union of all subsets
preceding the current subset. It is much more efficient to accumulate
whole subsets than to accumulate node by node. The binary search algorithm
is similar to a DVC-style algorithm but is more efficient because at any
time only one of the two parts is recursively processed.
To find such a subset we divide the original node set into two halves
and take the one which contains the designated node. If the designated
node is not the last, we do the same recursively to the chosen (half)
subset. Here's a somewhat simplified version of the Binary Search
set:leading() code:
<xsl:stylesheet version="1.0" xmlns:xsl = "http://www.w3.org/1999/XSL/Transform" xmlns:caller="urn:my-object"> <xsl:param name="ns1" select="/.."/> <xsl:param name="ns2" select="/.."/> <xsl:template name="binLeading" match="leading"> <xsl:param name="pNodes" select="$ns1"/> <xsl:param name="pANode" select="$ns2[1]"/> <xsl:param name="pLeading" select="/.."/> <xsl:choose> <xsl:when test="count($pANode | $pNodes[last()]) = 1"> <xsl:value-of select="caller:storeXPathResult($pLeading | $pNodes[position() != last()])"/> </xsl:when> <xsl:otherwise> <xsl:variable name="vcntHalf" select="floor(count($pNodes) div 2)"/> <xsl:variable name="vHalf1" select="$pNodes[position() <= $vcntHalf]"/> <xsl:choose> <xsl:when test="count($pANode | $vHalf1) = count($vHalf1)"> <xsl:call-template name="binLeading"> <xsl:with-param name="pNodes" select="$vHalf1"/> <xsl:with-param name="pANode" select="$pANode"/> <xsl:with-param name="pLeading" select="$pLeading"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="binLeading"> <xsl:with-param name="pNodes" select="$pNodes[position() > $vcntHalf]"/> <xsl:with-param name="pANode" select="$pANode"/> <xsl:with-param name="pLeading" select="$pLeading | $vHalf1"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet> |
Eliminating recursion
Not surprisingly, we can even further improve efficiency if we manage to eliminate recursion altogether.
Let's take as example set:distinct():
- We copy every node from the argument node set
$ns1into an element of a temporary tree. This element has the same string value as the node being copied. It also has an attribute"p", the value of which is the position of the original node in the argument node-set. - Then using the Muenchian method for grouping (see Grouping using xsl:key and the key() function -- the Muenchian method) we find in this tree every first in document order element that has some unique value.
- We create a pipe-delimited string
$vidsof the original positions (the value of the"p"attribute) of all such first elements. - Finally, the set of distinct elements is:
$ns1[contains($vIds, concat('|', position(), '|'))]
Here's the code for set:distinct():
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:caller="urn:my-object"> <xsl:param name="ns1" select="/.."/> <xsl:key name="kVal" match="e" use="."/> <xsl:template match="distinct"> <xsl:variable name="vrtfDoc"> <xsl:for-each select="$ns1"> <e p="{position()}"><xsl:value-of select="."/></e> </xsl:for-each> </xsl:variable> <xsl:variable name="vDoc" select="msxsl:node-set($vrtfDoc)"/> <xsl:variable name="vIds"> <xsl:for-each select="$vDoc/e[generate-id() = generate-id(key('kVal', .)[1]) ]"> <xsl:if test="position() = 1">|</xsl:if> <xsl:value-of select="concat(@p,'|')"/> </xsl:for-each> </xsl:variable> <xsl:value-of select="caller:storeXPathResult ( $ns1[contains($vIds, concat('|', position(), '|') ) ] )"/> </xsl:template> </xsl:stylesheet> |
In a similar way we derive new algorithms for
set:intersection() and set:difference() by
constructing a pipe-delimited string of concatenated node ids.
Then the result (e.g. for set:difference()) is
$ns1[not(contains($vIds, concat('|', generate-id(), '|')))]