Binary Killed the XML Star?
November 19, 2003
Binary Infoset Workshop Report
There are at least two kinds of topics of permanent conversation in the XML development community: formally settled, and formally unsettled. In other words, members of the XML development community are perpetually discussing, on the one hand, issues which have been, more or less, formally settled by the relevant standards body and, on the other, issues not yet formally settled by the relevant standards body. As the canonical example of the first kind of permathread I tend to think of XML namespaces, which really are just here to stay, plain and simple. As the canonical example of the second kind, I tend to think of binary XML, which may or may not be blessed by the W3C, but which certainly engages the XML developer community in deep and fundamental ways.
In a previous article about this topic in August ("Binary XML, Again"), I concentrated on the degree to which binary XML variants strike directly at the heart of what many XML developers take to be XML's chief advantage, that is, human (really: programmer) readability. While XML is not strongly self-descriptive in the way that many of its proponents claim, it is weakly self-descriptive in a way that many XML developers think of as advantageous, especially over against opaque binary alternatives or equivalents.
The precipitating cause of that article was The W3C Workshop on Binary Interchange of XML Information Item Sets, the report and minutes -- as well as about 40 position papers -- of which have now been published publicly. As Liam Quinn reported on XML-DEV, the workshop concluded that further work -- "of an investigative nature", as the workshop report puts it -- is required before a W3C standard could be made; but the workshop also recommended the formation of a working group.
The workshop focused on pulling together some initial sense of the requirements for a binary variant of XML, as well as some sense of the dominant use cases for a variant. Neither the workshop nor the report consolidated or synthesized the requirements in any really useful way, instead simply presenting a list of 51 requirements. Some of the interesting requirements include: a generic, rather than domain-specific solution; storage and transmission efficiency; prioritize decompression over compression; a minimal performance metric of 10 times faster than the best current textual XML performance; packaging support (something like a binary MIME); versioned delta support; some kind of encoding negotiation; fast arbitrary access to infoset items; work with existing parser APIs; arbitrary specifications of serialization order; oh, and most importantly, it "must be easy to implement". Sure, why not?
Frankly -- and this isn't just the gloomy weather in Washington, DC, today talking -- I find this requirements list to be one of the most depressing XML things I've ever encountered. This seems as much as anything, and especially for some of the biggest players, simply a way to revisit most, if not all of the most fundamental XML design decisions. That possibility, backed by the kind of real world power that of necessity really matters in the W3C, is simply dreadful. The only redeeming note is that the requirements list contains multiple, mutually contradictory elements, which offers some hope that the antitextualists might go off into some corner, far from the rest of us, and tie themselves into knots for a few years. What a welcomed and deserved respite that would be.
One of the aforementioned requirements was that a binary XML variant should just work with existing APIs, as well as not create any turbulence at the application level. One clear implication of this requirement is that, for example, application code which uses SAX to parse XML should just work -- modulo changes made in the actual SAX libraries, of course -- with any binary variant. Whether or not such a thing is practical or worth the effort is a separate question, of course.
Bob Wyman recently started a detailed, interesting conversation on XML-DEV about this issue. Wyman points to existing efforts: Objective Systems has a "SAX-like interface for ASN.1 defined binary encodings" and "OSS Nokalva is working on a SAX interface for ASN.1 defined encodings".
One of the implementational problems of wedding SAX to binary XML variants, as Wyman puts it, is that SAX assumes all of its input is characters; but any sane binary encoding will, depending on data type information, encode things differently and in a way which is most appropriate. But, as Wyman points out, converting binary into characters so that SAX event handlers can convert some of them back into various kinds of binary is "wasteful silliness". Wyman suggests three possible solutions: first, convert all binary types to strings; second, develop a SAX superset which includes data typing information along with the data itself; third, provide a way to toggle between these two modes.
Simon St. Laurent suggested that, despite the "wasteful silliness", implementing Binary SAX in such a way as to maximize interoperability over against absolute efficiency is the only way to go: "Sure, it's messy, but it's a transition strategy, gets ASN.1 consumers immediate access to a lot of XML toolkits, and helps bridge the cultural gap between ASN.1 and XML."
It's not only the binarists who might want additional type information in some version of SAX. Alaric Snell pointed out that
SAX with typed data would not just be handy to people using binary encodings...people who are transporting, say, dates in XML need to write their own code in the SAX handler that says "Oooh, it's the element <taxPoint> within a <purchase> element? Then pass the string content through the DateParser I've configured to handle the format of date we use in order to convert it to a
java.util.Dateobject for processing".
But pushing schema information into that layer constitutes a serious mistake to some, including St. Laurent:
I'd be thrilled to see ASN.1 readers which produce SAX2 events and ASN.1 writers which consume SAX2 events. I'm not happy to hear notions of PSVI-like typing polluting the SAX2 space. If you want typing, find another API - and accept the costs of doing that. If the ASN.1 community wants to reach out to the XML community, it needs to create ASN.1 tools which talk to XML tools without imposing ASN.1's own and different perspective on how data should be presented.
Also in XML-Deviant
Clearly the antitextualists raise deep technical concerns but also a kind of social concern for the rest of us, that is, those who think XML, as it is, is good enough; or, at least, good enough often enough that the binary variants are likely to be a waste of time, at best. The W3C's workshop report suggests that a possible outcome of a binary working group would be that the W3C chooses not to endorse a recommendation. That seems more possible than likely, however. The idea of a binary variant seems like a fairly radical proposal at what is a relatively late point in the game. It's not clear that the resulting pain and retooling efforts are really worth the gains.
Many XML proponents and users came out of various binary exchange and format camps, and they are very unwilling to return to what were for them, or so it would seem, dark days. In this case, however, given the real power of those who most seem to want a binary variant, they may have to adopt a carefully tactical plan to limit the damage, rather than preventing the fight completely.