XML Parser Benchmarks: Part 2
Pages: 1, 2
In the next benchmark for medium-sized documents (Figure 2) LIBXML2 is still ahead of the others for documents up to 455 KB. The Oracle DOM implementation does better as the documents get bigger and catches up to LIBXML2 for documents around 455 KB in size. Both AXIOM implementations do worse with increasing document size. Of the three Java object model parsers the Java 1.5 default DOM parser is always ahead of dom4j, and dom4j always ahead of JDOM.

Figure 3: Benchmark results for object model parsers with large documents
Figure 3 reveals that the AXIOM implementations do significantly worse than all other implementation for large documents. For the 4 MB document the C implementation of AXIOM has a performance drop. LIBXML2 looses its leading position for these document sizes and is overtaken by Java 1.5 DOM, the Oracle parser and dom4j for the 4 MB files.
Partial Document Parsing Benchmark
In the previous benchmarks we tested the complete walk through the documents in which the AXIOM implementations could not play out their advantages of only building the object tree until the last requested node. In the following benchmarks we only requested the first 67 elements of each document. This corresponds, for example, to the use case of only checking the header of a SOAP message for its contents.

Figure 4: Benchmark results for the reading of only the first 67 elements in small documents
In Figure 4 we can see that the AXIOM implementations cannot play out their advantages for small documents until this size of 5 KB. From the 13.5 KB sized files on, both implementations beat LIBXML2 and Java DOM.

Figure 5: Benchmark results for the reading of only the first 67 elements in large documents
In Figure 5 you can see that the two AXIOM implementations expose the same performance for all document sizes, which is expected since the only need to read in the first 67 elements. The other parser, obviously perform worse with growing document sizes because they need to build the whole document tree before they can walk through the elements.
Conclusions
From the above presented benchmarks, LIBXML2 can be considered as the overall performance winner for object model parsers. It not only performs much better than all other parsers on documents up to 500 KB in size, but it also beats the two AXIOM implementations for documents up to 5 KB, when only the first part of the documents is read. It also does especially well for very small documents of about 1 KB where it is up to 10 times faster than the other implementations. For really big documents above 500 KB the default Java 1.5 DOM parser and the Oracle DOM parser in C are alternatives.
But as the partial documents parsing benchmarks show, it is advisable that you evaluate which use cases of XML processing you will perform the most. If you find that in most cases you will only need to alter parts in the beginning of a XML document, you should consider using the Java AXIOM implementation. Due to the version status of 0.96 of the AXIOM implementation in C, and the significant performance drop for large documents, we recommend you to wait for future releases of that parser. dom4j does slightly worse, compared to the Java 1.5 default DOM implementation, but has a more convenient API.
Of course development time also plays a significant role in the decision process which parser to choose. For all tested C parsers you have to be very careful not to produce memory leaks, which will slow down the development. On the other hand especially the JDOM and dom4j APIs are very convenient to use.
Together with other benchmarks we performed on security operations like encryption and signature, the benchmarks of this article made us confident to use the LIBXML2 parser in C, and C security libraries for our high performance web service security stack. The C libraries also have the advantage of using less memory than a full fledged JVM, which is an advantage on small security appliances that we want to use.
Additional Resources
- Java Web services, Part 2: Digging into Axis2: AXIOM by Dennis Sosnoski
- Sun's XMLTest XML parser benchmark tool
- xmlbench, which is a XML parser benchmark tool for C parsers
- JAXB Comparison
2007-12-14 01:39:00 girish_inf [Reply]
Why haven't you considered XML parsing through binding architecture like JAXB. Any specific reasons for exclusion?
For one my projects where we had to parse a XML document of 20KB, JAXB was much faster to SAX and DOM parsers.
- Re: JVM startup
2007-05-22 06:21:28 MatthiasFarwick [Reply]
Actually the startup of the JVM is not considered. The measuremt takes place after the JVM is started and even after a few seconds of parsing to get the parser warmed up.
If you consider the use case of a server which receives SOAP messages and has to parse them, the startup of the JVM is not relavent to the performance since it should be running all day.
But the JVM itself is relevant if it is deployed on small dedicated appliances since it uses alot of memory compared to C applications.
- JVM startup?
2007-05-22 05:56:02 Eric Schwarzenbach [Reply]
Is time taken to load and initialize the JVM counted against the timings of the Java implementations?
Share your experience in our forums.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
