The issue is standardizing a "binary XML" for interoperability
Date:
2003-11-21 01:43:05
From:
Michael Rys
The problem is not necessarily "binary XML". The problem is the notion of making it an additional interoperability standard.
I gave the presentation of the Microsoft position at the W3C workshop above. And we certainly do not see a value in standardizing a "binary XML" for interoperability (hint: nice to have references to sources, but it may be good to also read them). Having more than one interoperability standards format (even if they claim to be "the same"), fragments the interop story and thus is counter-productive.
There is value for binary representations of Infosets, XQuery data models etc. for internal processing (database storage, close-coupled transport from storage to APIs and XML feeds). However, these formats will want to be highly optimized for the given architecture and performance scenarios; and these formats are not interested to sacrifice this for the sake of interop. Instead, the APIs and XML itself provide the interop layer.
Michael Rys wrote:
"However, these formats will want to be highly optimized for the given architecture and performance scenarios; and these formats are not interested to sacrifice this for the sake of interop. "
It just isn't that "binary." It is simply not true that *everyone* who wants a binary compact encoding is so bit-sensitive that they would refuse to sacrifice some compression in order to get interop and reusable tool sets. In fact, I believe that a large number of people who need compression or faster parsing would view ASN.1 defined encodings as "good enough" 80% solutions and understand that the 20% is worth paying as the cost of interop and a vastly expanded tool set.
I agree with Dare and Michael Rys that the article misrepresented Microsoft's position!
I'm a bit wishy washy on this. I found Sun's presentation quite interesting; to paraphrase "We want to move our customers from RMI to SOAP, but they're resisting because there is a 10x performance degradation between RMI and JAX-RPC. We did some prototypes substituting ASN.1 for XML text as the SOAP serialization format and got the performance back up to RMI." MS and IBM countered that there are a LOT of other reasons besides XML text parsing that might explain that, and a LOT of room for optimization of XML text parsing that may get the performance back without sacrificing interop.
I'm wondering about Michael Rys' point: "However, these formats will want to be highly optimized for the given architecture and performance scenarios; and these formats are not interested to sacrifice this for the sake of interop." There is a technical question of whether there are feasible application and platform-neutral formats that are dramatically faster to parse and more compact to store and transmit than XML text is. Clearly it would be a mistake to go off and start a W3C working group to standardize such a thing, but some collaborative research to see if it is possible seemed like a good idea to the workshop participants.
If, and only if, further research indicates that something like this is feasible, would concrete requirements be specified. At that point it would be interesting to debate the question of whether XML users (broadly defined to include the Infoset-based technologies such as XSLT, XPath, XQuery, DOM) would be better or worse off with alternative Infoset serialization standards.
Assuming for the sake of argument that a fast/compact alternative is feasible, I don't think the case for one and only one serialization is as open and shut as the MS folks believe. The overwhelming majority of real XML applications don't interoperate in any meaningful way -- perhaps because they require a shared schema, or because they use different character encodings (only UTF-8 and UTF-16 must be supported by XML parsers), or because much more "semantic" information than is encoded in a schema must be shared before data can be usefully processed. Adding an alternative serialization to the mix won't "break" much in the real world, as bad as it sounds in the abstract.
As I said at the workshop: If there is a format that covers all the use cases and has such a huge benefit that it outweighs the cost, then I personally (and probably Microsoft) can be swayed to change the position. The only problem I see is that we have done internal research on binary infoset representations for some while now and have not found one that addresses all the requirements in an interoperable way. But then, others may have a better idea and prove me wrong...
And to Kendall: Thanks for fixing the error. Although now most of the replies seem to be context-free since the offending text does not appear above anymore (thanks to Dare who has it copied below).