Binary Waltz, Play On

January 28, 2004

Robin Berjon

Over two months have now passed since the W3C workshop on Binary Interchange. While the binary XML debate rages on in the XML community, the cautious observer will note that the discussion is shifting from a polarised and rather sterile fight to a slightly quieter conversation in which both sides of the fence try to understand issues raised by the other. We are still a good distance away from a peaceful chat over tea, but this shift towards communication is a welcome one. This debate raises and contributes to interesting issues that will not simply go away and need to be addressed, characterized, and laid out cleanly. Also I, as with many others, have grown tired of the endless bickering of the "binary XML" permathread and would like us to give ourselves a chance to get around a table and put it to rest once and for all.

There Be Dragons

Binary representation of XML is a problem-space with its own healthily varied ecosystem of solutions. Some of these vary wildly, some are very similar, some standardized by various organizations, some entirely proprietary, some generic, and some very much ad hoc. But one thing that all the participants in the workshop agreed on was that while one is free to do whatever one wants with XML, the one and only place that could foster cross-domain and widely applicable recommendations for XML (with the notable exception of Relax NG) is the W3C. Thus the issue came up that not only has the W3C no specification in this space -- which is considered by some to be a good thing, by others a bad one -- but more importantly that it has formed no official opinion on the matter. The TAG has opened the binaryXML-30 issue, but it hasn't moved in a while. With Architecture of the World Wide Web in Last Call, they have other cats to skin.

The question of whether an authoritative body has formed an opinion on a given topic may seem, if not pointless, at least abstract. If a problem is solved by a given technology it is solved no matter what is said of it, and if it isn't then one must show that it can be solved before trying to standardize a solution. But looking more closely at the problem at hand, I can see two scenarios in which a careless approach in either standardizing a solution immediately, or ignoring the issue entirely, would anger me and I am sure many of my readers.

In the first scenario, the present situation of many competing solutions is maintained, with not even a document to guide choices regarding binary interchange solutions. Since there thought to be a genuine need for binary XML, a few solutions -- proprietary or standardized by a consortium that does not care for royalty-free technologies -- take over the market. I am a web content developer, and I want to make my content and services available to the 2 billion mobile terminals or to the many million Web-enabled digital televisions out there. All of these devices function on tight technical constraints and use binary formats for SVG, XHTML, web services, or Semantic Web agents. Yet since no consensus solution exists, the many combinations of manufacturers, vendors, operators, and technologies use different binary interchange formats. There is no doubt that the XML nebula of technologies, with its insistence on separating structure from presentation, has simplified multi-channel publishing; but each channel still requires work to adapt content to it, and that work should be simplified where possible, especially if it requires the content developer to pay for some if not all of the channel that she wants to publish to. Clearly a single royalty-free format is more desirable than this situation.

O'Reilly Emerging Technology Conference.

In the second scenario, the W3C or perhaps another entity sufficiently respected produces a widely accepted binary interchange format. But it has happened overly fast, with no heed paid to the benefits that having a single universal format has brought us. Again I find myself reading about a service I wish to interact with that according to its documentation answers to queries formulated in XML. Yet, some developer or manager there has decided that since there are two formats, one of which requires less processing power than the other, there is no reason why they should support the less "efficient" one. I fire off my text editor and generic HTTP client only to find out that the solution is still a few steps removed from that which I was expecting. What a number of people have dubbed "a threat to the XML brand" is first and foremost a threat to universality, in that one shifts from having to use a single format to having a choice of two.

In communication technologies, choice is only good when you're the one making it. Of course, there are always solutions involving negotiation or discovery, but these complexify the situation and are not always applicable. In the presence of a standard XML Binary Interchange format, strong rules are required to preserve interoperability. At the very least, endpoints that only accept the binary format must have very good reasons of doing so, which is to say that they must not use the binary format as an optimization of XML, but because it is their only option. All others would have to support XML, only supporting the the binary format as an optimization option.

Based on my experience as a developer, either of those scenarios is certain to cause me unhappiness, likely to make me angry, and may possibly leave me frothing at the mouth. Having had to deal with character encoding problems in CSV interchange and binary generation issues in SWF/Flash publishing, to take just two examples, I would like to avoid encountering similar problems ever again. XML goes a long way in addressing issues in both of these situations, but not all of them and not all of those encountered by the constituency that want to use it. And so we are left with a choice between two dragons. Either the W3C decides not to define a binary interchange format and we have to deal with the dragon of balkanization and ad hoc formats, or it does and we have to struggle with the dragon of inappropriate usage in the face of multiple options. The one thing that is certain is that as a community of users of W3C technology, we can't afford to simply drop the ball. We have to make a choice, and to make it we need to thoroughly think it out. We have to figure out which dragon is best for us to wrestle, which of them is most in our ability to handle, and from there get to work seriously on earning the kill.

One Format To Rule Them All

We already have a Format To Rule Many Of Them, and it's called XML. The problem is that, as has been expressed by most of the several dozens of participants in the workshop, that format is so desirable that everyone wants to use it (or at least what they consider to be "it"). Yet there are a number of situations in which it is not practical, or even impossible to use. Personally I would much prefer to live in a text-only world, but unfortunately experience tells me that it is not an option for all of us.

It should not come as a surprise that having put a diverse crowd of happy people in a big room for three days, the workshop diligently produced an endless list of requirements. Looking more closely at that list, however, it becomes apparent that there is a strong amount of overlap in many of them, and that they could be consolidated, something which the workshop decided not to do on the spot.

Having made a first rough pass at consolidating it (and at spotting requirements that would be better solved in another layer), the remaining list is still rather long. A cursory glance shows that some requirements may, if not clash, at least cause some friction were a format to be defined to address them. The question here is naturally that, given a large set of requirements, even if there is agreement that producing a standard solution is the right thing to do, it may not be possible to find one that is generic enough. The workshop participants were very clear on the fact that they would much prefer to use a generic and standard format that would be somewhat less optimal for their needs rather than something perfectly adapted but entirely ad hoc. That is to say, it is not a problem if there is some friction between requirements, but we need to find out just how much friction is tolerable, and how much will make the format unusable to too large a part of the community. Again, this is not a question that can be addressed with a little benchmarking sprinkled here and there with flamebaiting or with marketing speak. It requires some level of agreement on how binary formats can be evaluated, as well as discussion between interested parties on how much optimality they are willing to sacrifice to obtain generality.

Let's Meet Again

As you can see, there are quite a few reasons to pursue work in common on this topic, whether a format is eventually defined or not. In a nutshell, I believe that that is why the workshop reached consensus on the idea that "the W3C should do further work in this area". At least, that's my reading of it.

Given the strong opinions that some of its members expressed, it is clear to me that the XML community at large has to be part of the debate. I think that the best thing to do is for all interested parties to prepare arguments to defend their positions in a way that encourages progress. We've had much name calling and fear mongering already in this discussion, it's about time things became civilized. Much interesting talk awaits us; let's get to it.