Summary of XML Parser Performance Testing

May 5, 1999

Clark Cooper

We tested six XML parsers that run on a Linux system: two C parsers (C-Expat and RXP), two Java parsers (Java XP and IBM's Java XML4J) and implementations in Perl and Python based on the Expat parser. Four of the six parsers rely on James Clark's Expat parser.

The test consisted on a single program that I called XMLstats because it reads an XML document and produces a report detailing the elements of those documents. I wrote the program in C, Java, Perl, and Python, using as best I knew how the best features of each language to do the same job. I ran the XMLstats program on five different XML test documents. All of the documents were derived from the XML 1.0 Recommendation. The file rec.xml is that document, which is about 160K in size. The med.xml is 6 times the size of rec.xml and big.xml is 32 times that size (literally, the Recommendation repeated 6 and 32 times.) chrmed.xml and chrbig.xml contain just the text contents of rec.xml repeated 6 and 32 times.)

The performance data generated by these tests is summarized in Table 1.

Table 1. XML Parser Performance Chart

  REC chrmed med chrbig big
C-Expat 0.050 0.110 0.380 0.340 1.480
C-Rxp 0.100 0.320 0.740 1.060 2.937
Java-xp 2.400 2.693 4.770 4.010 12.587
Java-xml4j 3.033 3.470 6.770 5.280 19.230
Perl 1.413 3.420 8.410 10.750 32.357
Python 1.650 4.797 12.183 15.893 48.473
Figure 1.
Comparison of Six XML Parsers Processing rec.xml.

There aren't really many surprises. The C parsers (especially Expat) are very fast, the script-language parsers are slow, and the Java parsers occupy a middleground for larger documents. However for smaller documents (less than .5 megabytes), the Perl and Python parsers are actually faster than either Java parser tested.

Figure 1 graphs the performance of six parsers for rec.xml file. In this sample, both Java parsers are the slowest of the six. In Figure 2, which tests a larger sample, the Python parser is the slowest, but the two Java parsers and the Perl and Python parsers are very similar in speed. In both tests, the C parsers are extremely fast.

Figure 2. Comparison of Six XML Parsers Processing chrmed.xml.

Figure 3 graphs the performance of all six parsers against each of the five test files. Note that the Java parsers do much better on larger files than the Perl and Python implementations.

These tests only measure execution performance. Note that sometimes programmer performance is more important than parser performance. I have no numbers, but I can report that for ease of implementation, the Perl and Python programs were easiest to write, the Java programs less so, and the C programs were the most difficult.

Figure 3. Comparison of Six XML Parsers Processing Each Test File.
Charts: Kim Scott

If you are interested in the details of how this test was put together, read the companion article "Constructing the XML Parser Benchmark."