XML.com

Revving the XSLT 1.0 Engines: Are they all the same?

January 26, 2017

Rick Jelliffe

A rough guide to the relative performance of XSLT systems on some simple XSLT 1.0 test cases.

XSLT will soon be 20 years old. That is plenty long enough to consider implementations mature. But are they any good? XSLT does not, in general, have a good reputation for efficiency. This is of particular interest to me, not only because I write a good deal of XSLT but also because it can impact the continued success of Schematron, which is (usually) based on XSLT.

Benchmarks are enormously difficult things to get right. If you get some optimization setting wrong, or happen to use a version with a bug, you can provide misleading and unfair results. If your test happens to activate a particular optimization in a product, your readers may get an unrealistic impression the performance of the product.  

But we, as consumers and designers, need information if we are to produce good systems and make good decisions.  For this, initially we don’t need benchmarks that tell us that some system is 5% better than some other system: we need coarse timings that show whether different products may be an order of magnitude faster or slower, and therefore warrant more detailed benchmarking.  We need to have a general feel for the performance of systems, which we can get from simple benchmarks. 

Big Question

Do all XSLT engines have performance in the same order of magnitude?

If some simple benchmark shows that XSLT engines vary in their performance by more than an order of magnitude, then it suggests that a decent chunk of our development time, as the primary optimization,  should be the selection of the best XSLT engine to suite our stylesheets and documents.

And, as a follow-on from that, it would indicate that choosing the XSLT engine merely because it comes as default with your chosen platform, will not necessarily produce a happy result if you have hard real-time performance requirements.

And, finally, it would indicate that care needs to be taken if evaluating the performance of XSLT compared to some alternative technology: you can game the results against XSLT by choosing an XSLT engine that is not efficient for some XPaths.

Engines

XSLT engines broadly fall into two classes:  

  • small fast XSLT 1.0 engines, usually written in some C-family compiled language. The main examples are the Microsoft XSLT that comes built into Windows, and the free libxslt library (xsltproc);
  • Java-based XSLT implementations that are heading towards XQuery and XSLT3.  Saxon and Altova’s engine are in these camps.  

I tested these systems. I did not enable any optimizations or special fiddles (though I double-checked some runs with different JVM memory settings, just in case there was some hidden gotcha there): I just measured the performance after a simple install and with the simplest command lines.  The issue of how well systems could perform if tuned etc is interesting, but not relevant for this discussion: I wanted to get an feel for out-of-the-box performance. 

  • LibXSLT – open source C implementation, the most recent as of 2017.  You have to download 4 packages and add them to the appropriate path. Even then, there was a problem with iconv that I could not resolve: I don’t think it makes a difference to the timings.
  • Microsoft XSLT – I used the version that comes with default .NET on Windows 10.
  • Xalan-J -  An XSLT 1.0 engine implemented in Java.  I also attempted to load Xalan-C, but the instructions for the Windows binaries were not dtailed enough for me to figure out what to do. 
  • Saxon for Java – I used the priceless Home Edition v9.7 which may not have some optimizations of the commercial editions.
  • Saxon for .NET –  I used the priceless Home Edition v9.7
  • Altova – The command line version of their Raptor XML server distribution.

Method

I wrote Powershell scripts to run and measure the tests on my laptop, using command line interface. None of the tests or systems seemed to take advantage of multi-threading. The tests were a series of runs to expose where time might be taken. I ran the tests a few times and took the last result, out of habit rather than necessity. The Powershell script is printed at the end; it may be useful for newcomers wanting to see how you invoke XSLT on files.

  • Simplest (or Load): this takes a one-element XML document and generates an output XML document with a single element <OK/>. This is the smallest possible XML transformation, and mainly measures the amount of time it takes to fire up the engine. (This file dummy.xml is printed at the end.)
  • Read ODF: I pulled out the content.xml file of the most recent ODF specification from OASIS, as an example of a larger document. It is about 7.5MB of UTF-8. This test reads the document in and generates <OK/>. This is a test of reading time.
  • Copy ODF: this test reads the same ODF XML file and writes it to output using xsl:copy-of. This is a test of basic copying and writing.
  • Traverse ODF: this test reads the same ODF XML file then uses apply-templates to replicate the input to the output (except namespaces are stripped). This tests the efficiency of the template mechanism.
  • MultiMode: this test is not too far from what Schematron compiles to: a script with lots of sparse modes, featuring predicates testing the presence of children.
  • Count ODF: this test is the same as the Traverse ODF text, except it adds a count number at the start of each content section, with the count being quite a taxing XPath to calculate: <xsl:value-of select="count(preceding::*[following-sibling::*[1]])" /> The purpose of this test is to measure tree traversal performance. I would not expect preceding::* to be particularly optimized, even though I might expect following-sibling::*[1] could be.

Results

The timings for the Load, Read ODF, Copy ODF and Traverse ODF in ms are below.

Transposing this table and putting it into diagram form, you can see what an impact the Virtual Machine startup time has. It is often much greater than loading the 7.5 MB file.

XsltBenchmark0
Graph of Basic Test Times

If you want an estimate of the performance of a server-based system rather than a command-line transformation, subtract much of the blue “Simplest” bar.

Next we add the MultiMode test. 

XsltBenchmark1A
Graph of timings including multi-mode test

If we add the results of the Count ODF test to that same chart: the time taken to perform the task dwarfs the startup, load and write times to the extent that they do not even show up in the graph!

XsltBenchmark1
Graph of Timings including Count ODF

Timing are in milli-seconds.

Xslt Benchmark Timings
Results

Conclusion

So for the big question, the answer is "No, not all XSLT have performance in the same order of magnitude". Each test had an order of magnitude difference between the fastest and slowest result.

For these benchmarks:

  • For each test, the Microsoft engine was fastest. 
  • If we ignore JVM startup time, libxslt had similar performance to some Java systems.
  • For Count ODF (large documents with complex traversal), the Saxon Java processor took only twice as long as the Microsoft processor, and was three times faster than libxslt.
  • Xalan J performed relatively badly on the more taxing tests.

Discussion

  • Anyone evaluating XSLT-based systems, or comparing XSLT-based systems with some other language, should be very mindful of which implementation they use, since there can easily be an order of magnitude difference.
  • If you are processing small files with Java, you need a framework that prevents repeated invocations of the JVM: it would be interesting to repeat the tests using a system such as Nailgun for Java that pre-launches JVMs. This is an advantage of using XSLT transform servers, or Schematron validation servers compared to command-line invocation. 
  • Stick with XSLT 1.0 if your documents and transformations are quite simple, and you have severe performance requirements.  Check out EXSLT extensions on those systems.
  • The current Schematron pipeline with multiple invocations of XSLT may add more latency than users deserve  when validating small instances; for running on the XSLT 2.0 systems, it would be better to merge the pipeline into a single XSLT stylesheet. 

Personally, I could never move back to XSLT 1.0, because of the programming improvements of XSLT 2.0.   The timings do not test the XPath implementation comprehensively at all: but they do at least give a feel of some of the current limits of performance. The source code for the tests is available here, so you can try out different engines yourself on scripts that reflect your actual requirements.

It is important to state that these simple benchmarks cannot be taken as indications of the general performance of any system: that would take more comprehensive benchmarks, and benchmarks that tested more expectable XPaths.  Just because Xalan-J is, on the face of it, not optimized for the 'preceding' XPath axis, does not mean it may not be optimized for the particular patterns and use-cases it was developed for. And of course performance is only one aspect of an XSLT engine: integration into larger systems, robustness and other quality measures are part of the trade-off.

Files

dummy.xml

<?xml version="1.0" encoding="UTF-8"?>
<DUMMY/>

RunXsltTransform.ps1

(Note that some long lines have been broken to fit.)

# RunXsltTransform - Powershell script to run the same XSLT script using multiple engines
# To run, don't forget to  Set-ExecutionPolicy Unrestricted
# Adjust the scripts for the locations of the engines on your system
# Adjust the scripts for the current working directory
# To run the tests for an engine, edit the engine in the MAIN region at the end. 
# (Note that some engines do not use the current working directory, and may prefer 
# absolute paths for the script or the input document.) 

function RunXsltTransform {
  <#
  .SYNOPSIS
   Facade and dispatcher to make it simple to test multiple XSLT engines
  .DESCRIPTION
  Facade and dispatcher to make it simple to test multiple XSLT engines. 
  Run a single XSLT transform
  .EXAMPLE
  RunXsltTransform msxml tttt.xsl input.xml  output.xml  error.xml 
  .PARAMETER engine
  The XSLT engine to use. Just one.
  .PARAMETER script
  The XSLT script to run.
  .PARAMETER input
  The XML input file
  .PARAMETER output
  The XML output file
  .PARAMETER error
  The error file
  .PARAMETER param
  List of name/value pairs to supply as parameters
  #>
  [CmdletBinding()]
  param
  (
    [Parameter(Mandatory=$true, position=0)]
    [ValidateNotNullOrEmpty()]
    [string] $engine,
    [Parameter(Mandatory=$true, position=1)]
    [ValidateNotNullOrEmpty()]
    [string] $script,
    [Parameter(Mandatory=$true, position=2)]
    [ValidateNotNullOrEmpty()]
    [string] $infile,
    [Parameter(Mandatory=$true, position=3)]
    [ValidateNotNullOrEmpty()]
    [string] $outfile,
    [Parameter(Mandatory=$true, position=4)]
    [ValidateNotNullOrEmpty()]
    [string] $errorfile
    )
$env:Path += ";C:\Program Files\bin\libxslt-1.1.26.win32\bin\;
                                         C:\Program Files\bin\zlib-1.2.5\bin;
                                         C:\Program Files\bin\libxml2-2.7.8.win32\bin;
                                         C:\Program Files\bin\iconv-1.9.2.win32\bin"

$LIBXSLTEXE ="C:\Program Files\bin\libxslt-1.1.26.win32\bin\xsltproc.exe"
$ICONVEXE = "C:\Program Files\bin\iconv-1.9.2.win32\bin\iconv.exe"
$SAXONJAR = "C:\Program Files\Saxonica\SaxonHE9-7-0-14J\saxon9he.jar"
$SAXONEXE = "C:\Program Files\Saxonica\SaxonHE9.7N\bin\Transform.exe"
$XALANJAR = "C:\Program Files\xalan\xalan-j_2_7_2\serializer.jar;
                                           C:\Program Files\xalan\xalan-j_2_7_2\xalan.jar;
                                           C:\Program Files\xalan\xalan-j_2_7_2\xercesImpl.jar;
                                           C:\Program Files\xalan\xalan-j_2_7_2\xml-apis.jar"
$XALANEXE = "C:\Program Files\xalan\XALANCPKG-11-31-VC100\bin\Xalan.exe"
$EXSELTEXE ="C:\Program Files\Exselt\Exselt XSLT 3.0 Processor\Exselt.exe"
$ALTOVAEXE = "C:\Program Files\Altova\RaptorXMLServer2017\bin\RaptorXML.exe"
   
 cd d:/ricko/tmp 
 del $errorfile 2>&1>>  $null
 del $outfile  2>&1>>  $null

 echo $engine
 echo $script
 echo $infile
 echo $outfile
 echo $errorfile
 
 switch ( $engine ) {
 
    libxslt {
        try {
            &$LIBXSlTEXE  $script  $infile  | &$ICONVEXE -f UTF-16 -t UTF-8  > $outfile   
        }
        catch {
           $_.Exception.Message  >>  $errorfile
        }
        finally { 
        }
    }
  
    saxonj {
        try {
            java -jar $SAXONJAR ("-s:" + $infile)  ("-xsl:" + $script) ("-o:" + $outfile)  
                                                2>&1 >>  $errorfile
        }
        catch {
           $_.Exception.Message  >>  $errorfile
        }
        finally { 
        }
    }
   
    saxonn {
        try {
            &$SAXONEXE    ("-s:" + $infile)  ("-xsl:" + $script) ("-o:" + $outfile)   
                                                  2>&1 >>  $errorfile           
        }
        catch {
           $_.Exception.Message  >>  $errorfile
        }
        finally {
        }
    }
    

    msx  {
        try { 
            $scriptUri = New-Object System.Uri( $script )
            $XsltSettings = New-Object System.Xml.Xsl.XsltSettings
            $XsltSettings.EnableDocumentFunction = 1
            $xslt = New-Object System.Xml.Xsl.XslCompiledTransform
            $xslt.Load($scriptUri.AbsolutePath , $XsltSettings, 
                                        (New-Object  System.Xml.XmlUrlResolver))
            $xslt.Transform($infile, $outfile)   2>&1 >>  $errorfile       
        }
        catch {
           $_.Exception.Message  >>  $errorfile
        }
        finally {
        }
    }
    xalanj {
        try {
            java -cp $XALANJAR  org.apache.xalan.xslt.Process -IN $infile -XSL $script 
                                             -OUT $outfile   2>&1 >>  $errorfile 
        }
        catch {
           $_.Exception.Message  >>  $errorfile
        }
        finally {
        }
    }
    
    xalanc {
        try {
            &$XALANEXE  $infile $script -o $outfile  2>&1 >>  $errorfile 
        }
        catch {
           $_.Exception.Message  >>  $errorfile
        }
        finally {
        }
    }
   
    
    altova {
        try {
            &$ALTOVAEXE xslt --xslt-version=1 ("--input=" + $infile) ("--output=" + $outfile) 
                                      ("--log-output=" + $errorfile)  $script  2>&1 >>  $errorfile 
        }
        catch {
           $_.Exception.Message  >>  $errorfile
        }
        finally {
        }
    }
    
    default {
        "Not a supported XSLT engine " + $enginefile
    }
 }
}
 
#region MAIN
cd d:/ricko/tmp 

Measure-command { RunXsltTransform "altova"  "test.xsl"   "dummy.xml"  
                                     "D:\ricko\tmp\altova.txt"  "D:\ricko\tmp\altovaerr.txt" }

Measure-command { RunXsltTransform "altova"  "test.xsl"   "content.xml"  
                                      "D:\ricko\tmp\altova.txt"  "D:\ricko\tmp\altovaerr.txt" }

Measure-command { RunXsltTransform "altova"  "test2.xsl"   "content.xml"  
                                     "D:\ricko\tmp\altova.txt"  "D:\ricko\tmp\altovaerr.txt" }

Measure-command { RunXsltTransform "altova"  "test3.xsl"   "content.xml"   
                                     "D:\ricko\tmp\altova.txt"  "D:\ricko\tmp\altovaerr.txt" }

Measure-command { RunXsltTransform "altova"  "test4.xsl"   "content.xml"   
                                     "D:\ricko\tmp\altova.txt"  "D:\ricko\tmp\altovaerr.txt" }

#endregion

XSLT Scripts

test.xsl test2.xsl test3.xsl test4.xsl Multimode XSLT