Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

XML Parser Benchmarks: Part 1
Pages: 1, 2

StAX Parser Benchmark

You can choose the StAX implementation (for example Apache's AXIOM) in many recent Java XML applications. Since there are already a handful of StAX implementations out there, we compared their reading performance in the following benchmarks.


Benchmark results for the StAX parsers and small documents.
Figure 3: Benchmark results the StAX parsers and small documents

Benchmark results for the StAX parsers and medium sized documents.
Figure 4: Benchmark results for the StAX parsers and medium-sized documents

Benchmark results for the StAX parsers and large documents.
Figure 5: Benchmark results for the StAX parsers and large documents

Figures 3-5 show the benchmarks of the five different StAX implementations. In all but the last benchmark the Javolution and the Woodstox parser perform the best results. The SUN SJSXP lags behind for small documents but outperforms all other parsers for the very large 4 MB XML file. The BEA implementation is slightly better for small documents than the SJSXP, but for XML files bigger than 10 KB it is overtaken by the SJSXP. Oracles StAX implementation ranks last on the two biggest documents where it performs equal to the BEA implementation.

Conclusions

From the results of the benchmarks we can see that there are big performance differences between the parser implementations. Overall the SAX-like implementation of LIBXML2 in C performs best in all benchmarks. For most document sizes it had one-third to twice as much throughput as its competitors. This is interesting because as we will see in the next part of this series, the LIBXML2 DOM implementation in C uses this parser to read in data and therefore already has a performance advantage over the other object model parsers in Java. A negative point of this parser is definitely the complexity of its interface. With the need to handle void, and double pointers in the callback interface, it is a great difference to the rather intuitive use of the Java StAX interfaces.

Javolution and Woodstox are the winners of the StAX parsers. Woodstox has the advantage of being JSR 173 conforming StAX parser, which makes it usable for more applications.

In the next part of this series we will look at the results of the object model parser benchmarks, and will see if any Java parser can beat the performance of the LIBXML2 object model parser in C. This will lead to our final conclusion which XML parser to use for our high-performance web service gateway.

Additional Resources

  • The StAX specification JSR 173
  • Sun's XMLTest XML parser benchmark tool
  • xmlbench a XML parser benchmark tool in C
  • SUN's StAX benchmark with XMLTest

  • Comment on this articleShare your experience in our forums.
    (* You must be a
    member of XML.com to use this feature.)
    Comment on this Article


    Titles Only Titles Only Newest First
    • Reply
      2007-05-15 08:48:35 MatthiasFarwick [Reply]

      @TextScience
      You are right, parsing is not the only bottleneck. Actually the original (not published) version of this article in german translates to "one of the main bottlenecks". As we mention in the second part of this article, which will be posted in the future, we also conducted benchmarks on the XML security libraries that exist in C an Java. So we are well aware of the other main bottlenecks.


      @Alan Carlyle
      In our benchmarks we focused on the comparison between C and Java libraries. Therefore the .NET implementations were not considered. Also, the MSXML parsers in C are not open-source(as far as I know), which excludes them from our universitary work.

    • Where's Microsoft?
      2007-05-15 05:40:03 Alan Carlyle [Reply]

      I noticed you haven't included Microsoft's MSXML 4.0, MSXML 6.0, or the .NET Framework built in parsers.


      Any reason for this omission?


    • Parsing is NOT the bottleneck
      2007-05-11 13:19:27 TextScience [Reply]

      Thanks for an enlightening article and useful work. BUT ... I don't agree with your opening premise: "Five years after the introduction of SOAP 1.0, XML parsing is still the main bottleneck in web service performance." Parsing, by itself, does not produce a result and the work to turn parsed XML into a result is, in my experience, 80 to 90% of the job. For common processes in Web Services like XML Signing, Encryption, XSLT, certainly but any kind of transformation parsing does not exceed 10% of the performance cost. Accelerate parsing by 10,000X in the case and you still have a maximum process acceleration of 1.1X.