XML Parser Benchmarks: Part 2
Pages: 1, 2
In the next benchmark for medium-sized documents (Figure 2) LIBXML2 is still ahead of the others for documents up to 455 KB. The Oracle DOM implementation does better as the documents get bigger and catches up to LIBXML2 for documents around 455 KB in size. Both AXIOM implementations do worse with increasing document size. Of the three Java object model parsers the Java 1.5 default DOM parser is always ahead of dom4j, and dom4j always ahead of JDOM.

Figure 3: Benchmark results for object model parsers with large documents
Figure 3 reveals that the AXIOM implementations do significantly worse than all other implementation for large documents. For the 4 MB document the C implementation of AXIOM has a performance drop. LIBXML2 looses its leading position for these document sizes and is overtaken by Java 1.5 DOM, the Oracle parser and dom4j for the 4 MB files.
Partial Document Parsing Benchmark
In the previous benchmarks we tested the complete walk through the documents in which the AXIOM implementations could not play out their advantages of only building the object tree until the last requested node. In the following benchmarks we only requested the first 67 elements of each document. This corresponds, for example, to the use case of only checking the header of a SOAP message for its contents.

Figure 4: Benchmark results for the reading of only the first 67 elements in small documents
In Figure 4 we can see that the AXIOM implementations cannot play out their advantages for small documents until this size of 5 KB. From the 13.5 KB sized files on, both implementations beat LIBXML2 and Java DOM.

Figure 5: Benchmark results for the reading of only the first 67 elements in large documents
In Figure 5 you can see that the two AXIOM implementations expose the same performance for all document sizes, which is expected since the only need to read in the first 67 elements. The other parser, obviously perform worse with growing document sizes because they need to build the whole document tree before they can walk through the elements.
Conclusions
From the above presented benchmarks, LIBXML2 can be considered as the overall performance winner for object model parsers. It not only performs much better than all other parsers on documents up to 500 KB in size, but it also beats the two AXIOM implementations for documents up to 5 KB, when only the first part of the documents is read. It also does especially well for very small documents of about 1 KB where it is up to 10 times faster than the other implementations. For really big documents above 500 KB the default Java 1.5 DOM parser and the Oracle DOM parser in C are alternatives.
But as the partial documents parsing benchmarks show, it is advisable that you evaluate which use cases of XML processing you will perform the most. If you find that in most cases you will only need to alter parts in the beginning of a XML document, you should consider using the Java AXIOM implementation. Due to the version status of 0.96 of the AXIOM implementation in C, and the significant performance drop for large documents, we recommend you to wait for future releases of that parser. dom4j does slightly worse, compared to the Java 1.5 default DOM implementation, but has a more convenient API.
Of course development time also plays a significant role in the decision process which parser to choose. For all tested C parsers you have to be very careful not to produce memory leaks, which will slow down the development. On the other hand especially the JDOM and dom4j APIs are very convenient to use.
Together with other benchmarks we performed on security operations like encryption and signature, the benchmarks of this article made us confident to use the LIBXML2 parser in C, and C security libraries for our high performance web service security stack. The C libraries also have the advantage of using less memory than a full fledged JVM, which is an advantage on small security appliances that we want to use.
Additional Resources
- Java Web services, Part 2: Digging into Axis2: AXIOM by Dennis Sosnoski
- Sun's XMLTest XML parser benchmark tool
- xmlbench, which is a XML parser benchmark tool for C parsers
- JAXB Comparison
2007-12-14 01:39:00 girish_inf - Re: JVM startup
2007-05-22 06:21:28 MatthiasFarwick - JVM startup?
2007-05-22 05:56:02 Eric Schwarzenbach