Faster, Faster!

December 1, 2004

Edd Dumbill

While there's nothing I'd like to do more than gossip with you about the recent XML 2004 conference, I will assume that as loyal readers you followed my weblog entries and's coverage on the subject anyway. Suffice to say that meeting Len Bullard in person was the highlight of my week.

Conference over, there are several weeks of debate and announcements from the XML developer world to catch up with. Unsuprisingly, some strong threads of conversation spring from topics discussed at XML 2004.

Binary XML Reloaded

Mike Champion gave a presentation at XML 2004 concerning the overhead of using XML and attempts to mitigate it. Although various attempts have been made, the most concerted effort to alleviate XML's overheads is currently the W3C's Binary XML Working Group.

One XML-DEV Champion observed that he was expecting a lot more hostility to the notion of "binary XML" than he got. He asked, "Is the world ready to hear that XML 1.x text serialization is not suitable for wireless applications, is this old news, or what?"

Binary XML advocate and working group member Robin Berjon responded, saying that the working group had also expected more pushback than it received:

... during a recent presentation I tried to outrage the audience with some over-the-top pro-binary XML positions (expecting to use the push-back to moderate them) but all I got was a dozen heads nodding and not a single shocked face. Worse, when I subsequently tried to moderate my own discourse part of the audience disagreed.

... so far the negatives haven't been coming and that worries me a bit. Is everyone perfectly happy and comfy with the notion of a W3C-approved binary XML format (I doubt it)? Have people given up (on the Life, the Universe, the W3C)?

While I find it tempting to suspect that most people have given up watching the activities of W3C working groups, there's certainly something to explain here given the outrage on this topic in times past. Champion agreed with Berjon's doubts that everybody would be happy with the idea of a single W3C-blessed binary XML:

I think people have accepted that there is a problem, but there is intense skepticism that a single standard format will cover all (or even 80%) of the requirements. One data point that some people from Microsoft brought up in the Binary XML Town Hall: They have tried to come up with one binary serialization that will satisfy even their internal customers, and haven't found one. There is also extreme skepticism that a typical W3C design by committee job will come up with anything useful.

So perhaps the end mission of the W3C's explorations in binary XML is not achievable? Although most agree that the research into scenarios that it is now engaged upon is extremely valuable, the concern is that no solution, even one satisfying the 80:20 rule, can be found. That's certainly Champion's viewpoint. It's time to experiment a while and then see if the best practices can be distilled into a W3C Recommendation: "Leave evolution to Darwin, not Berners-Lee."

Berjon, as you might imagine, disagreed with Champion's skepticism. We've already had the time for experimentation, says Berjon, and now is the time to distill best practice:

We've got the industry-specific standards, we've got the more generic standards, we've got deployment experience--in some cases we're talking millions of terminals. The XBC WG is just us guys coming back with the data and best practices in hand to see if W3C Recommendations can be agreed upon.

For sure there's always room for innovation in the space, but we're way past the point where it's experimental. In fact, it's been quite a while since I last saw something that caught my attention as truly innovative in the field. The question now is really about if we want one W3C standard for all--at the cost of having possibly two universal formats instead of one--or do we want XML and the three or four binary XML standards (from other groups) that'll survive market competition?

But Can XML Go Any Faster?

So much for the standards activity. Another interesting angle came from Champion's talk, which was the assertion that there was plenty more work to do in speeding up conventional XML parsers. Jeff Rafter seized on one common strategy for speeding up XML parsing:

I have always been of the opinion that, at least for certain classes of documents, validation could be used to speed the process up. If certain VCs were considered fatal errors, then character checking would be reduced for start tags (i.e., instead of checking that a character is a valid XML character, you would check if the name is valid according to the DTD).

Rafter is referring in passing to the problem that some well-formedness checking is an unnecessary overhead if one is going to validate anyway. Richard Tobin agreed, and Michael Kay noted an optimization that doesn't actually require DTDs or schemas:

You don't even need a DTD, you can just check it against the names you've already seen in the instance.

I actually tried to get some extra speed out of AElfred by using this kind of technique (basically trying to get AElfred to take advantage of the name pool that Saxon already maintains, and to reduce duplication in other areas such as namespace handling). I got a boost of about 5% on an identity transformation, but Xerces+Saxon was still 5% faster, so I gave up.

Elliotte Rusty Harold also uses a similar technique for the validation of namespace URIs in his XOM project. He also referred to the name-caching technique Kay mentioned, to which Oleg Tkachenko added that Microsoft .NET's XML classes use it too.

Anti-Awards for 2004

The awarding of the XML Cup at XML 2004 to the worthy recipients Robin Cover and Jean Paoli put me in mind of the Anti-Awards, which I last awarded three years ago. Although we live in a world where lunacy and reality are increasingly indistinguishable, I've decided the time is right to deliver another round of Anti-Awards this year. Thus, I'm inviting nominations of rank stupidity, absurdity, and unusual applications of XML technology. Please send them to me at

Births, Deaths and Marriages

The latest announcements from the XML-DEV mailing list:

xlinkit Rule Workbench Release

Commercial business rules system based around XLink.

Exchanger XML Editor V3.0 Released

The latest release of this commercial cross-platform XML editor includes many new features, including support for XQuery, XSLT 2.0, and SVG.

nux-1.0beta2 Release

Nux is an extension of the XOM XML library, adding XQuery and XPath support and binary (de-)serialization among other features. Open source.

Altova Ships v2005 Products

Massive announcement of 2005 product line from the makers of XML Spy. They're also giving away iPods and XQuery engines. I know which I'd rather have.

PDF Publishing of XML Documents with Serna XML Editor

Commercial PDF add-on for Syntext's cross-platform Serna editor. Runs on Windows or Linux. Utilizes Antenna House's XSL-FO formatter under the hood.

Sedna XML DBMS Released as Open Source

Now released under the Apache 2.0 license, Sedna is a native XML DBMS implemented in C/C++ and Scheme. Ships with both Java and Schema APIs.

DTDDoc 0.0.11 Is Out

DTDDoc is a free Ant task to document DTDs in a fashion like that of Javadoc.

TeXML 1.2 Is Out

TeXML is an XML vocabulary for TeX. The processor transforms TeXML markup into the TeX markup, escaping special and out-of-encoding characters. Version 1.2 includes bugfixes and more legible generated LaTeX code.

XEditNet--WYSIWYG XML Editor for .NET

Prerelease of a commercial XML editor.


If you were in any doubt that you missed an XML clique love-in of massive proportions, read this and feel the love ... we're so metal ... we're so old ... 143 messages to XML-DEV last week, Len rating: just delightful ... SOA a SOB, no information here ... just more confusion, but the journalist barb hurts, Mike ... and don't forget the Anti-Awards, folks!