XSLT Processor Benchmarks
March 28, 2001
|Table of Contents|
XSLTMark is a benchmark for the comprehensive measurement of XSLT processor performance. It consists of forty test cases designed to assess important functional areas of an XSLT processor. The latest release, version 2.0, has been used to assess ten different processors. This article describes the benchmark methodology and provides a brief overview of the results.
Important Update (2001/04/04). Since the first publication of this article, errors in the methodology has been discovered. Although this doesn't affect the "headline news," it is significant. For full details please be sure to read the complete explanation below in the "Results" section. The performance chart has been updated to reflect the corrections.
The performance of XML processing in general is of considerable concern to both customers and engineers alike. With more and more XML-encoded data being transmitted and processed, the ability to both predict and improve XML performance is critical to delivering scalable and reliable solutions. While XSLT is a big part of delivering on the overall value proposition of XML (by allowing XML-XML data interchange and XML-HTML content presentation), it also presents the greatest performance challenge. Early anecdotal evidence showed wide disparities in real-life results, and no comprehensive benchmark tools were available to obtain more systematic assessments and comparisons. We first created and started using XSLTMark internally at DataPower in mid-2000 and released it publicly in November. By using a well-balanced test base it is possible to make predictions about likely application behavior and to select the best processor for a given application. Considerable improvements have been made by some XSLT engines over the past six months, but it is also clear that further performance improvements will be required to support the growth of XML.
We considered many possible scoring systems for measuring XSLT performance. A survey of existing non-XML benchmark platforms revealed that many benchmarks use abstract unit-less scores to rate performance. These scores are often composed of weighted averages of separate benchmark components that would not be otherwise aggregated. While the abstract scoring method is excellent for relative studies of performance, it lacks the value of a "real-world" number in a standard unit.
Most of the other efforts to assess XSLT performance have centered around execution time measurements for a small number of test cases (where lower scores are better). These numbers are again great for relative comparison, but they are hard to assess in an absolute sense (i.e., in relation to other types of computer processing).
For these reasons, and because XML is increasingly becoming part of the network, XSLTMark uses kilobytes-per-second as its overall score, where kilobytes are the average of input and output document size. This provides a score that is tied to both the document size and the time expended for processing. The variations between scores for different test cases are then attributable to the complexity and specifics of processing performed by the stylesheet and the structure of the input document. By examining the detailed data for the individual cases a great deal of additional knowledge can be gleaned. We conducted some preliminary tests to obtain nodes-per-second measurements, but in the end we settled on kilobytes-per-second as the best way to characterize real-world performance.
The first two releases of XSLTMark computed as a total score an aggregate KB/s measurement, computed according to the total execution time and total kilobytes processed. We like this calculation because it strictly measures the overall performance of the processor on a very broad range of tasks. It is important to understand that this aggregate score gives more weight to computationally-intensive test cases -- since the score is based on the total execution time, the "slower" test cases will have a greater effect on the score. This contrasts with an arithmetic mean of individual test case scores, which is weighted in favor of "faster" test cases.
XSLTMark 2.0 introduces a geometric mean score in addition to the aggregate score. We include this measurement because it provides an average of test case scores in a manner that is not weighted by the qualities of individual test cases. Specifically, scaling the throughput of a single test case results in a scaling of the geometric mean by a factor that does not depend on which test case is scaled.
In order to support both C/C++ and Java processors, XSLTMark uses wall clock time (elapsed real world time, rather than CPU seconds) as obtained by gettimeofday() or Java's System.currentTimeMillis(). This means that benchmarking must occur on an unloaded system, and tests should execute a sufficient number of iterations to avoid real time clock granularity and interrupt effects. Considerable time was invested in ensuring that this approach produced precise and accurate measurements.
The spotty compliance of many XSLT processors meant that we had to spend considerable time manually verifying the output of various tests. DataPower's internal projects also required that results be verified, so basic compliance checking was built into XSLTMark early on. The intent is not to provide a compliance test suite; although XSLTMark is comprehensive in its functional area coverage and presents a balanced performance assessment, it is not comprehensive enough for a full compliance suite. We look forward to the compliance efforts of OASIS and W3C. XSLTMark's compliance testing exists to ensure that largely incomplete processors do not receive unfairly high benchmarks. This is especially important because implementing many parts of the XSLT specification correctly means a certain performance penalty. Often a processor that does well on a subset of cases but fails many others will be considerably slower by the time it achieves full compliance. (This was the case for Transformiix, Mozilla's XSLT processor, which has made great progress in compliance but at a cost to performance).
Result verification is achieved by normalizing the output using DataPower's "dgnorm" (for "normalizer") tool. This simple C program is a SAX processor that removes insignificant whitespace, handles HTML peculiarities, alphabetically sorts attributes and does some other processing to make the output of XSLTMark stylesheets directly accessible to "diff" and byte-wise compares. After normalization, a simple comparison of a reference result and the output is performed. (Purists correctly protest that dgnorm is not a general XML normalizer; it is only suitable for normalizing the results of XSLTMark testcases).
It should be noted that sometimes there's more than one correct result, in which case it's still necessary to verify all "CHK OUTPUT" lines to make sure that they reflect a real compliance problem. This is why some benchmark results have a few manually corrected scores. Previous XSLTMark releases triggered comments from a number of prominent XSLT implementers, and some of the thorny compliance ambiguities have been resolved in the current version. The number test case was difficult to assess due to ambiguity in the XSLT specification and widespread disagreement among processors, so we omitted the associated reference file; hence the "NO REFERENCE" found in the detailed results.
The two main kinds of XSL application are XML to HTML and XML to XML conversion. These have different performance profiles, so both types of tests are included in XSLTMark. Real-life use cases range from simply filling in an almost static HTML template to complex processing or conversion of business documents. The four major components of XSLT processing are
- XSLT template pattern matching and template instantiation
- XSLT control structures and parameter passing
- XPath selection of nodesets and predicates
- XPath library functions such as string and nodeset operations
XSLTMark test cases assess the performance of processors in all four of these areas, and Table 1 gives a breakdown of test cases versus XSLT components. Some test cases attempt to isolate specific processing phases (which is not always possible), while others are balanced and therefore more realistic. The performance of the output phase of the processing is also very important, especially when the processing itself is highly optimized.
Table 1: The Test Cases
|Test Case||Input Size||Input Description||Stylesheet||Notes|
|alphabetize||M||100-row database table||select, control||Sorts the input tree according to element name.|
|attsets||S||sales report||general||Tests node-copying using named attribute sets.|
|avts||M||100-row database table||select||Tests attribute-value template expansion.|
|axis||S||select||Tests XPath selection along the different axes.|
|backwards||S||control||Reverses order of elements in input document.|
|bottles||S||initial size parameter||function, control||Generates "99 bottles of beer on the wall" song.|
|breadth||S||broad and shallow tree||select, control||Performs a search for a unique element in a large tree.|
|brutal||S||select, function, control||Executes many functions, sorts, etc.|
|chart||S||sales report||select, control||Generates an HTML chart of some sales data.|
|creation||M||100-row database table||general||Tests xsl:element and xsl:attribute.|
|current||S||select||Tests complex XPath node selection.|
|dbonerow||L||10000-row database table||select, control||Selects a single row from a very large table.|
|dbtail||M||100-row database table||select||Prints a table by traversing the following-sibling axis.|
|decoy||S||100-row database table||match||Same template as patterns, with some decoy templates thrown in.|
|depth||S||narrow and deep tree||select, control||Performs a search for a unique element in a large tree.|
|encrypt||M||100-row database table||function||Performs a Rot-13 operation on all element names and text nodes|
|functions||M||100-row database table||function||Tests a variety of number and string functions.|
|game||S||baseball game stats||select, functions, control||Produces a HTML table of the data.|
|html||S||select, control||Literal result element as stylesheet example from XSLT spec.|
|identity||L||1000-row database table||control||The identity transform.|
|inventory||S||Inventory data||select, control||Produces a HTML table of the data.|
|metric||S||Data in metric notation||function||Converts metric units to English units.|
|number||S||function||Tests format-number() function.|
|oddtemplate||S||match, select||Tests a variety of complex match patterns.|
|patterns||M||100-row database table||match||Stylesheet contains extremely simple templates with tough patterns.|
|prettyprint||M||100-row database table||control, function||Formats the input input legal HTML.|
|priority||S||select, control||Pops the first element off a priority Queue and returns the queue.|
|products||S||Product data||select, control||Produces an HTML table from the data.|
|queens||S||initial size parameter||function, control||Solves the "8 Queens" problem. (Stylesheet by Oren Ben-Kiki, used with permission.)|
|reverser||S||The Gettysburg Address||function, control||Stylesheet copies input with text-node strings reversed.|
|stringsort||L||1000-row database table||control||Performs a sort based on string keys.|
|summarize||S||"Queens" stylesheet||function||Reports information about an XSL stylesheet.|
|total||S||sales report||select, function||Reports on sales data.|
|tower||S||initial size parameter||control, function||Solves the Towers of Hanoi problem.|
|trend||S||Numerical data||select, functions||Computes trends in the input data.|
|union||S||match, select||Performs complex pattern matching.|
|xpath||S||match||Performs complex pattern matching.|
|xslbench1||S||test1.xml from XSLBench||general||This test case is "test1.xsl" from Kevin Jones' XSLBench test suite, used with permission.|
|xslbench2||L||A Midsummer Night's Dream||match, select||This test case is "test2.xsl" from Kevin Jones' XSLBench test suite, used with permission.|
|xslbench3||L||A Midsummer Night's Dream||select, function||This test case is "test3.xsl" from Kevin Jones' XSLBench test suite, used with permission.|
Of the processors included in this release of the benchmark, MSXML, Microsoft's C/C++ implementation, is the fastest overall. The three leading Java processors, XT, Oracle and Saxon, have surpassed the other C/C++ implementations to take 2nd through 4th place respectively. This suggests that high-level optimizations are more important than the implementation language in determining overall performance. The C/C++ processors tend to show more variation in their performance from test case to test case, scoring some very high marks alongside some disappointing performance. XSLTC aside, the C/C++ processors won first place in 33 of the 40 test cases, in some cases scoring two to three times as well as their Java competitors (attsets, dbonerow). This suggests that there is a lot of potential to be gained from using C/C++, but that consistent results might be harder to obtain.
The alpha release of Sun's XSLTC is the first compiler to be included in an XSLTMark benchmark. Although its performance on certain test cases is very promising, XSLTC's compliance is still quite spotty. We look forward to future releases of this software.
XSLT processors have improved substantially since the last XSLTMark release in November. It is especially encouraging to see such dramatic changes in certain processors, such as Xalan Java, over so short a time. Unfortunately, although most of the processors we reviewed were very fast on certain test cases, all of them suffered from poor performance in other areas. In addition, only Saxon was able to correctly execute all forty test cases. Ultimately, no processor has emerged as a clear winner in both performance and compliance.
Update (2001/04/04): Since the publication of this article it has been brought to our attention (thanks to Dan Holmsand and Michael Kay) that some of the older XSLTMark drivers did not follow the convention of excluding XML input parse time from the transformation time measurement. For some of the test cases this can have a significant effect on the score. This discrepancy between the stated intent (of measuring only the transformation time) and the actual source code had actually existed since the first release in October 2000, but was not detected until now. The XSLTMark drivers for the following processors had suffered this additional performance penalty when benchmarked: XT, Saxon, XalanJ, Sablotron and XalanC. Preliminary tests show that taking this into account does not alter the overall balance of power in XSLT performance and dislodge any of the top performers from their superpower status, but it does affect the results significantly.
Changes will be made to put all processors on a level playing field again, but it cannot be done as promptly as desired. The difficulty lies in the fact that some processors simply do not have an easily accessible API's for separate parsing followed by transformation, so it is not a quick change to fix their drivers to separate the parse step. For example, passing a DOM via the standard TRAX API to Saxon will still force it to rebuild the tree for every iteration.
Since it will take some time for the next set of results to be published, we wanted to promptly advise everyone of this error and provide an updated bar chart where the results for the two groups of processors are explicitly segregated. It is still perfectly reasonable to compare processors within a group, but caution should be exercised comparing between groups. Other than making this distinction clear we decided not to modify the results in any way. The next XSLTMark version will include updated drivers and may even change the scoring methodology for all processors.
Table 2: Best Score and Processor
|Test Case||Best Score||Top Processor||Top P+T Processor||Top T Processor|
|alphabetize||241.09||MSXML 3.0||XT 19991105||MSXML 3.0|
|attsets||603.86||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|avts||2348.15||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|axis||429.43||LibXSLT 0.5.0||XT 19991105||LibXSLT 0.5.0|
|backwards||299.86||LibXSLT 0.5.0||Xalan-J||LibXSLT 0.5.0|
|bottles||355.75||XT 19991105||XT 19991105||LibXSLT 0.5.0|
|breadth||909.73||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|brutal||605.96||MSXML 3.0||XT 19991105||MSXML 3.0|
|chart||399.52||MSXML 3.0||XT 19991105||MSXML 3.0|
|creation||1984.80||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|current||155.22||LibXSLT 0.5.0||XT 19991105||LibXSLT 0.5.0|
|dbonerow||1344.57||LibXSLT 0.5.0||XT 19991105||LibXSLT 0.5.0|
|dbtail||3254.32||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|decoy||559.24||MSXML 3.0||XT 19991105||MSXML 3.0|
|depth||524.35||LibXSLT 0.5.0||XT 19991105||LibXSLT 0.5.0|
|encrypt||586.50||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|functions||403.39||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|game||706.09||LibXSLT 0.5.0||XT 19991105||LibXSLT 0.5.0|
|html||313.64||LibXSLT 0.5.0||XT 19991105||LibXSLT 0.5.0|
|identity||2875.92||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|inventory||554.71||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|metric||192.89||LibXSLT 0.5.0||Saxon 6.2.1||LibXSLT 0.5.0|
|number||160.94||MSXML 3.0||Saxon 6.2.1||MSXML 3.0|
|oddtemplate||119.20||LibXSLT 0.5.0||XT 19991105||LibXSLT 0.5.0|
|patterns||616.22||MSXML 3.0||XT 19991105||MSXML 3.0|
|prettyprint||255.78||XT 19991105||XT 19991105||LibXSLT 0.5.0|
|priority||350.65||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|products||250.78||MSXML 3.0||Saxon 6.2.1||MSXML 3.0|
|queens||7.30||XT 19991105||XT 19991105||Oracle XSLT 2.0|
|reverser||92.40||Xalan-Java 2.0||Xalan-Java 2.0||MSXML 3.0|
|stringsort||838.98||MSXML 3.0||XT 19991105||MSXML 3.0|
|summarize||362.26||XT 19991105||XT 19991105|
|total||840.69||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|tower||180.57||XT 19991105||XT 19991105||MSXML 3.0|
|trend||58.49||XT 19991105||XT 19991105||Oracle XSLT 2.0|
|union||183.11||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|xpath||136.45||XT 19991105||XT 19991105||LibXSLT 0.5.0|
|xslbench1||1088.67||MSXML 3.0||XT 19991105||MSXML 3.0|
|xslbench2||1019.73||LibXSLT 0.5.0||XT 19991105||LibXSLT 0.5.0|
|xslbench3||36813.96||XSLTC Alpha 4||XT 19991105||XSLTC Alpha 4|
|Aggr_Results||442.61||MSXML 3.0||XT 19991105||MSXML 3.0|
We wish to thank Oren Ben-Kiki for permission to use the queens.xsl stylesheet and Kevin Jones for XSLBench and the permission to use the three xslbench stylesheets. We are also grateful for the support from XSLT engine implementers who provided us with XSLTMark drivers or helped analyze compliance issues: Michael Kay, Andrew Kimball, Berin Loritsch, Steve Muench. Also, thanks to the many other members of the XML Community for their helpful bug reports and suggestions.
If you are implementing or integrating an XSLT driver that does not currently have an XSLTMark driver, consider writing one and getting it included in the next release. It is important to support as many drivers as possible in a variety of implementation languages on a number of platforms. To our knowledge, XSLTMark has been used so far to test Java, C/C++, Perl, and Python implementations of XSLT processors on Windows, Solaris, HP-UX, and Linux platforms. Some processors currently have drivers, but their benchmark results are not being released at this time, including
- Oracle C++ (no license/permission)
- Smart Transcoder (discontinued)
- Unicorn XSLT (no API, permission)
- Napa (no API)
We welcome comments on the benchmark as well as suggestions for how it can become a better community resource. One of the areas of particular interest to us is assessing the memory footprint of different processors, something that's difficult to measure but of considerable value in production deployments.
For additional information and benchmark download, see http://www.datapower.com/XSLTMark/