Benchmarking XML Parsers
A performance comparison of six stream-oriented XML parsers
This article compares the performance of six implementations of a single program that processes XML. Each implementation uses a different XML parser and four different languages are used: C, Java, Perl, and Python. Only a single parser is tested for Perl and Python, but two parsers each are tested for C and Java.
All of these parsers run under Linux, and all are stream-oriented. An application must provide callbacks (or their equivalent) in order to receive information back from the parser. While some of them have a validating mode, all the parsers were run as non-validating parsers. When I say that a single program was implemented six times, I mean that each implementation produces (or should produce) exactly the same output for a given input document. But as long as that constraint was met, I attempted to write each in the most efficient manner for the given language and parser.
Full disclosure
But first, let me come clean. I'm the maintainer of one of the parsers measured here, the Perl module XML::Parser. I have an interest in making it look good. But I'm providing here everything I used to come up with my numbers. So you're welcome to download what I've got and try it out for yourself. Also, since I'm more experienced in Perl and C than Java and Python, gurus of those two languages may want to comb through the implementations written in them, checking for newbie mistakes.
What motivated me to run this experiment was a discussion on the performance (or lack thereof) of XML::Parser on the Perl-XML mailing list. I asked the question, "How does XML::Parser compare to the competition?" Either I got no answers or the answers were like comparing apples and oranges. (For instance, comparing XML::Parser to the Unix grep utility.) So I decided to take one of the sample programs contained in the XML::Parser distribution, XMLstats, and implement it using different parsers.