XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


XML Parser Benchmarks: Part 2

May 16, 2007

In part 1 of this series we showed you the results of our event-driven parser benchmarks. The outcome of these benchmarks showed that the LIBXML2 SAX-like parser in C is superior over the other tested parsers. In second place followed the two Java pull-parser implementations Javolution and Woodstox.

In this part of the series we will show you how the object model parser performed in our tests. Object model parsers read in the data by using the event parsers. The object model parser benchmarks were of special interest for our high performance web service security gateway, because most web services security operations involve that at least the header of a SOAP message is read and altered. This in-memory altering can only be done by object model parsers like DOM implementations. The results for the AXIOM implementations are also very interesting in this context. They use a pull-parser to build up the in-memory representation of a XML document until the last node that needs to be read or altered. This has the advantage that not the whole document needs to be read into memory.

The test setup is the same as in Part 1 of this series, only the AXIOM benchmark in C was compiled with the Mircosoft C/C++ compiler. For each parser the document throughput per second is measured.

The following list shows all tested object model parsers.

The Tested Object Model Parsers

  • LIBXML2 Tree 2.6.27 (C)
    LIBXML2 tree is DOM like XML parser. It uses the LIBXML2 SAX-like implementation to read in the XML data.
  • Java 1.5 Default DOM (Java)
    The default DOM implementation in Java 1.5. Uses the default SAX implementation to read in the documents.
  • Apache AXIOM Java 1.1.2, C 0.96 (Java und C)
    AXIOM is a XML object model by Apache. It was developed for Apache's Web Service Engine AXIS2, but it is pushed forward as a separate project. Currently there exist a Java and a C version of the parser. The Java version uses the Woodstox StAX parser to read in the documents. The C version uses the LIBXML2 stream pull-parser. As already mentioned AXIOM has the advantage of only building the document tree in memory until the last node of which data is needed. This way the whole tree only has to be built when the data in the end of the document is required to be read or altered. The C implementation is currently in version 0.96 and can therefore not be considered as fully stable.
  • DOM4J 1.6.1 (Java)
    DOM4J is an object model parser whose API was specially built for convenient use in the object oriented context of Java.
  • JDOM 1.0 (Java)
    Like DOM4J, this parser was built out of the need for an API that is more convenient to use in an object-oriented context than the W3C DOM specification.
  • Oracle XDK DOM implementation (C)
    This parser of the XDK (XML Development Kid) by oracle implements the W3C DOM specification.

Object Model Parser Benchmarks

The following benchmarks show the results for the tested parsers which build a document model in memory. In these benchmarks AXIOM cannot play out its advantages because in all tests the whole document is processed.
Benchmark results for object model parsers with small documents
Figure 1: Benchmark results for the object model parsers for small documents

Figure 1 shows that LIBXML2 is much faster than all other implementation for these three small document sizes. The two AXIOM parsers perform well for very small documents, since they do not seem to have the same overhead the DOM parsers expose. The Java 1.5 default DOM parser is the fastest of the three Java DOM parsers, closely followed by JDOM and dom4j. The Oracle DOM parser seems to have a significant overhead for each document it reads, since it reveals the worst performance for small documents.

Benchmark results for object model parsers with medium sized documents
Figure 2: Benchmark results for object model parsers with medium-sized documents

Pages: 1, 2

Next Pagearrow