Menu

XSLT Processor Benchmarks

March 28, 2001

Eugene Kuznetsov and Cyrus Dolph

Overview

Table of Contents

XML performance and XSLT
Scoring System
Result Verification
The Tests
Results
Acknowledgments

XSLTMark is a benchmark for the comprehensive measurement of XSLT processor performance. It consists of forty test cases designed to assess important functional areas of an XSLT processor. The latest release, version 2.0, has been used to assess ten different processors. This article describes the benchmark methodology and provides a brief overview of the results.

Important Update (2001/04/04). Since the first publication of this article, errors in the methodology has been discovered. Although this doesn't affect the "headline news," it is significant. For full details please be sure to read the complete explanation below in the "Results" section. The performance chart has been updated to reflect the corrections.

XML performance and XSLT

The performance of XML processing in general is of considerable concern to both customers and engineers alike. With more and more XML-encoded data being transmitted and processed, the ability to both predict and improve XML performance is critical to delivering scalable and reliable solutions. While XSLT is a big part of delivering on the overall value proposition of XML (by allowing XML-XML data interchange and XML-HTML content presentation), it also presents the greatest performance challenge. Early anecdotal evidence showed wide disparities in real-life results, and no comprehensive benchmark tools were available to obtain more systematic assessments and comparisons. We first created and started using XSLTMark internally at DataPower in mid-2000 and released it publicly in November. By using a well-balanced test base it is possible to make predictions about likely application behavior and to select the best processor for a given application. Considerable improvements have been made by some XSLT engines over the past six months, but it is also clear that further performance improvements will be required to support the growth of XML.

Scoring System

We considered many possible scoring systems for measuring XSLT performance. A survey of existing non-XML benchmark platforms revealed that many benchmarks use abstract unit-less scores to rate performance. These scores are often composed of weighted averages of separate benchmark components that would not be otherwise aggregated. While the abstract scoring method is excellent for relative studies of performance, it lacks the value of a "real-world" number in a standard unit.

Most of the other efforts to assess XSLT performance have centered around execution time measurements for a small number of test cases (where lower scores are better). These numbers are again great for relative comparison, but they are hard to assess in an absolute sense (i.e., in relation to other types of computer processing).

For these reasons, and because XML is increasingly becoming part of the network, XSLTMark uses kilobytes-per-second as its overall score, where kilobytes are the average of input and output document size. This provides a score that is tied to both the document size and the time expended for processing. The variations between scores for different test cases are then attributable to the complexity and specifics of processing performed by the stylesheet and the structure of the input document. By examining the detailed data for the individual cases a great deal of additional knowledge can be gleaned. We conducted some preliminary tests to obtain nodes-per-second measurements, but in the end we settled on kilobytes-per-second as the best way to characterize real-world performance.

The first two releases of XSLTMark computed as a total score an aggregate KB/s measurement, computed according to the total execution time and total kilobytes processed. We like this calculation because it strictly measures the overall performance of the processor on a very broad range of tasks. It is important to understand that this aggregate score gives more weight to computationally-intensive test cases -- since the score is based on the total execution time, the "slower" test cases will have a greater effect on the score. This contrasts with an arithmetic mean of individual test case scores, which is weighted in favor of "faster" test cases.

XSLTMark 2.0 introduces a geometric mean score in addition to the aggregate score. We include this measurement because it provides an average of test case scores in a manner that is not weighted by the qualities of individual test cases. Specifically, scaling the throughput of a single test case results in a scaling of the geometric mean by a factor that does not depend on which test case is scaled.

In order to support both C/C++ and Java processors, XSLTMark uses wall clock time (elapsed real world time, rather than CPU seconds) as obtained by gettimeofday() or Java's System.currentTimeMillis(). This means that benchmarking must occur on an unloaded system, and tests should execute a sufficient number of iterations to avoid real time clock granularity and interrupt effects. Considerable time was invested in ensuring that this approach produced precise and accurate measurements.

Result Verification

The spotty compliance of many XSLT processors meant that we had to spend considerable time manually verifying the output of various tests. DataPower's internal projects also required that results be verified, so basic compliance checking was built into XSLTMark early on. The intent is not to provide a compliance test suite; although XSLTMark is comprehensive in its functional area coverage and presents a balanced performance assessment, it is not comprehensive enough for a full compliance suite. We look forward to the compliance efforts of OASIS and W3C. XSLTMark's compliance testing exists to ensure that largely incomplete processors do not receive unfairly high benchmarks. This is especially important because implementing many parts of the XSLT specification correctly means a certain performance penalty. Often a processor that does well on a subset of cases but fails many others will be considerably slower by the time it achieves full compliance. (This was the case for Transformiix, Mozilla's XSLT processor, which has made great progress in compliance but at a cost to performance).

Result verification is achieved by normalizing the output using DataPower's "dgnorm" (for "normalizer") tool. This simple C program is a SAX processor that removes insignificant whitespace, handles HTML peculiarities, alphabetically sorts attributes and does some other processing to make the output of XSLTMark stylesheets directly accessible to "diff" and byte-wise compares. After normalization, a simple comparison of a reference result and the output is performed. (Purists correctly protest that dgnorm is not a general XML normalizer; it is only suitable for normalizing the results of XSLTMark testcases).

It should be noted that sometimes there's more than one correct result, in which case it's still necessary to verify all "CHK OUTPUT" lines to make sure that they reflect a real compliance problem. This is why some benchmark results have a few manually corrected scores. Previous XSLTMark releases triggered comments from a number of prominent XSLT implementers, and some of the thorny compliance ambiguities have been resolved in the current version. The number test case was difficult to assess due to ambiguity in the XSLT specification and widespread disagreement among processors, so we omitted the associated reference file; hence the "NO REFERENCE" found in the detailed results.

The Tests

The two main kinds of XSL application are XML to HTML and XML to XML conversion. These have different performance profiles, so both types of tests are included in XSLTMark. Real-life use cases range from simply filling in an almost static HTML template to complex processing or conversion of business documents. The four major components of XSLT processing are

  1. XSLT template pattern matching and template instantiation
  2. XSLT control structures and parameter passing
  3. XPath selection of nodesets and predicates
  4. XPath library functions such as string and nodeset operations

XSLTMark test cases assess the performance of processors in all four of these areas, and Table 1 gives a breakdown of test cases versus XSLT components. Some test cases attempt to isolate specific processing phases (which is not always possible), while others are balanced and therefore more realistic. The performance of the output phase of the processing is also very important, especially when the processing itself is highly optimized.

Table 1: The Test Cases

Test Case Input Size Input Description Stylesheet Notes
alphabetize M 100-row database table select, control Sorts the input tree according to element name.
attsets S sales report general Tests node-copying using named attribute sets.
avts M 100-row database table select Tests attribute-value template expansion.
axis S   select Tests XPath selection along the different axes.
backwards S   control Reverses order of elements in input document.
bottles S initial size parameter function, control Generates "99 bottles of beer on the wall" song.
breadth S broad and shallow tree select, control Performs a search for a unique element in a large tree.
brutal S   select, function, control Executes many functions, sorts, etc.
chart S sales report select, control Generates an HTML chart of some sales data.
creation M 100-row database table general Tests xsl:element and xsl:attribute.
current S   select Tests complex XPath node selection.
dbonerow L 10000-row database table select, control Selects a single row from a very large table.
dbtail M 100-row database table select Prints a table by traversing the following-sibling axis.
decoy S 100-row database table match Same template as patterns, with some decoy templates thrown in.
depth S narrow and deep tree select, control Performs a search for a unique element in a large tree.
encrypt M 100-row database table function Performs a Rot-13 operation on all element names and text nodes
functions M 100-row database table function Tests a variety of number and string functions.
game S baseball game stats select, functions, control Produces a HTML table of the data.
html S   select, control Literal result element as stylesheet example from XSLT spec.
identity L 1000-row database table control The identity transform.
inventory S Inventory data select, control Produces a HTML table of the data.
metric S Data in metric notation function Converts metric units to English units.
number S   function Tests format-number() function.
oddtemplate S   match, select Tests a variety of complex match patterns.
patterns M 100-row database table match Stylesheet contains extremely simple templates with tough patterns.
prettyprint M 100-row database table control, function Formats the input input legal HTML.
priority S   select, control Pops the first element off a priority Queue and returns the queue.
products S Product data select, control Produces an HTML table from the data.
queens S initial size parameter function, control Solves the "8 Queens" problem. (Stylesheet by Oren Ben-Kiki, used with permission.)
reverser S The Gettysburg Address function, control Stylesheet copies input with text-node strings reversed.
stringsort L 1000-row database table control Performs a sort based on string keys.
summarize S "Queens" stylesheet function Reports information about an XSL stylesheet.
total S sales report select, function Reports on sales data.
tower S initial size parameter control, function Solves the Towers of Hanoi problem.
trend S Numerical data select, functions Computes trends in the input data.
union S   match, select Performs complex pattern matching.
xpath S   match Performs complex pattern matching.
xslbench1 S test1.xml from XSLBench general This test case is "test1.xsl" from Kevin Jones' XSLBench test suite, used with permission.
xslbench2 L A Midsummer Night's Dream match, select This test case is "test2.xsl" from Kevin Jones' XSLBench test suite, used with permission.
xslbench3 L A Midsummer Night's Dream select, function This test case is "test3.xsl" from Kevin Jones' XSLBench test suite, used with permission.

 

Results

Full breakdown of results.

Of the processors included in this release of the benchmark, MSXML, Microsoft's C/C++ implementation, is the fastest overall. The three leading Java processors, XT, Oracle and Saxon, have surpassed the other C/C++ implementations to take 2nd through 4th place respectively. This suggests that high-level optimizations are more important than the implementation language in determining overall performance. The C/C++ processors tend to show more variation in their performance from test case to test case, scoring some very high marks alongside some disappointing performance. XSLTC aside, the C/C++ processors won first place in 33 of the 40 test cases, in some cases scoring two to three times as well as their Java competitors (attsets, dbonerow). This suggests that there is a lot of potential to be gained from using C/C++, but that consistent results might be harder to obtain.

The alpha release of Sun's XSLTC is the first compiler to be included in an XSLTMark benchmark. Although its performance on certain test cases is very promising, XSLTC's compliance is still quite spotty. We look forward to future releases of this software.

XSLT processors have improved substantially since the last XSLTMark release in November. It is especially encouraging to see such dramatic changes in certain processors, such as Xalan Java, over so short a time. Unfortunately, although most of the processors we reviewed were very fast on certain test cases, all of them suffered from poor performance in other areas. In addition, only Saxon was able to correctly execute all forty test cases. Ultimately, no processor has emerged as a clear winner in both performance and compliance.

Update (2001/04/04): Since the publication of this article it has been brought to our attention (thanks to Dan Holmsand and Michael Kay) that some of the older XSLTMark drivers did not follow the convention of excluding XML input parse time from the transformation time measurement. For some of the test cases this can have a significant effect on the score. This discrepancy between the stated intent (of measuring only the transformation time) and the actual source code had actually existed since the first release in October 2000, but was not detected until now. The XSLTMark drivers for the following processors had suffered this additional performance penalty when benchmarked: XT, Saxon, XalanJ, Sablotron and XalanC. Preliminary tests show that taking this into account does not alter the overall balance of power in XSLT performance and dislodge any of the top performers from their superpower status, but it does affect the results significantly.

Changes will be made to put all processors on a level playing field again, but it cannot be done as promptly as desired. The difficulty lies in the fact that some processors simply do not have an easily accessible API's for separate parsing followed by transformation, so it is not a quick change to fix their drivers to separate the parse step. For example, passing a DOM via the standard TRAX API to Saxon will still force it to rebuild the tree for every iteration.

Since it will take some time for the next set of results to be published, we wanted to promptly advise everyone of this error and provide an updated bar chart where the results for the two groups of processors are explicitly segregated. It is still perfectly reasonable to compare processors within a group, but caution should be exercised comparing between groups. Other than making this distinction clear we decided not to modify the results in any way. The next XSLTMark version will include updated drivers and may even change the scoring methodology for all processors.

Table 2: Best Score and Processor

Test Case Best Score Top Processor Top P+T Processor Top T Processor
alphabetize 241.09 MSXML 3.0 XT 19991105 MSXML 3.0
attsets 603.86 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
avts 2348.15 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
axis 429.43 LibXSLT 0.5.0 XT 19991105 LibXSLT 0.5.0
backwards 299.86 LibXSLT 0.5.0 Xalan-J LibXSLT 0.5.0
bottles 355.75 XT 19991105 XT 19991105 LibXSLT 0.5.0
breadth 909.73 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
brutal 605.96 MSXML 3.0 XT 19991105 MSXML 3.0
chart 399.52 MSXML 3.0 XT 19991105 MSXML 3.0
creation 1984.80 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
current 155.22 LibXSLT 0.5.0 XT 19991105 LibXSLT 0.5.0
dbonerow 1344.57 LibXSLT 0.5.0 XT 19991105 LibXSLT 0.5.0
dbtail 3254.32 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
decoy 559.24 MSXML 3.0 XT 19991105 MSXML 3.0
depth 524.35 LibXSLT 0.5.0 XT 19991105 LibXSLT 0.5.0
encrypt 586.50 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
functions 403.39 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
game 706.09 LibXSLT 0.5.0 XT 19991105 LibXSLT 0.5.0
html 313.64 LibXSLT 0.5.0 XT 19991105 LibXSLT 0.5.0
identity 2875.92 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
inventory 554.71 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
metric 192.89 LibXSLT 0.5.0 Saxon 6.2.1 LibXSLT 0.5.0
number 160.94 MSXML 3.0 Saxon 6.2.1 MSXML 3.0
oddtemplate 119.20 LibXSLT 0.5.0 XT 19991105 LibXSLT 0.5.0
patterns 616.22 MSXML 3.0 XT 19991105 MSXML 3.0
prettyprint 255.78 XT 19991105 XT 19991105 LibXSLT 0.5.0
priority 350.65 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
products 250.78 MSXML 3.0 Saxon 6.2.1 MSXML 3.0
queens 7.30 XT 19991105 XT 19991105 Oracle XSLT 2.0
reverser 92.40 Xalan-Java 2.0 Xalan-Java 2.0 MSXML 3.0
stringsort 838.98 MSXML 3.0 XT 19991105 MSXML 3.0
summarize 362.26 XT 19991105 XT 19991105  
total 840.69 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
tower 180.57 XT 19991105 XT 19991105 MSXML 3.0
trend 58.49 XT 19991105 XT 19991105 Oracle XSLT 2.0
union 183.11 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
xpath 136.45 XT 19991105 XT 19991105 LibXSLT 0.5.0
xslbench1 1088.67 MSXML 3.0 XT 19991105 MSXML 3.0
xslbench2 1019.73 LibXSLT 0.5.0 XT 19991105 LibXSLT 0.5.0
xslbench3 36813.96 XSLTC Alpha 4 XT 19991105 XSLTC Alpha 4
Aggr_Results 442.61 MSXML 3.0 XT 19991105 MSXML 3.0

Acknowledgments

We wish to thank Oren Ben-Kiki for permission to use the queens.xsl stylesheet and Kevin Jones for XSLBench and the permission to use the three xslbench stylesheets. We are also grateful for the support from XSLT engine implementers who provided us with XSLTMark drivers or helped analyze compliance issues: Michael Kay, Andrew Kimball, Berin Loritsch, Steve Muench. Also, thanks to the many other members of the XML Community for their helpful bug reports and suggestions.

Additional Notes

If you are implementing or integrating an XSLT driver that does not currently have an XSLTMark driver, consider writing one and getting it included in the next release. It is important to support as many drivers as possible in a variety of implementation languages on a number of platforms. To our knowledge, XSLTMark has been used so far to test Java, C/C++, Perl, and Python implementations of XSLT processors on Windows, Solaris, HP-UX, and Linux platforms. Some processors currently have drivers, but their benchmark results are not being released at this time, including

  • Oracle C++ (no license/permission)
  • Smart Transcoder (discontinued)
  • Unicorn XSLT (no API, permission)
  • Napa (no API)

We welcome comments on the benchmark as well as suggestions for how it can become a better community resource. One of the areas of particular interest to us is assessing the memory footprint of different processors, something that's difficult to measure but of considerable value in production deployments.

For additional information and benchmark download, see http://www.datapower.com/XSLTMark/