April Fool's Wisdom

April 13, 2005

Micah Dubinko

"April is the cruellest month" -- T. S. Eliot

XML devotees are, as a general rule, thoughtful, creative, and a bit mischievous. So when the calendar rolls around to April 1, a safe bet is that you'll find some interesting reading across not only the internet, but also on the XML-Dev mailing list. This year held no exception.

I find that humor has an important place, even in otherwise serious discussions. Arguments that might be uncomfortable, uncharacteristic, exceedingly blunt, or previously discussed-to-death become more palatable with a thin coating of silliness. Oftentimes, jokes are funny specifically because they contain a kernel of truth. Well, maybe a grain of truth. Sliver? Speck? Atom? Whatever the amount, this week's column will search it out, and highlight the connections that lead back to more familiar topics.

Fool Me Once

The first artifice came from the fertile mind of Sean McGrath, with a message bearing the serious title REST, SOAP, Speech Acts, and the mustUnderstand Model of SOA Communications, with bonus points for dovetailing into the previous discussion about REST versus SOAP. Interop is a key XML design issue--frequently highlighted as one of the reasons for using XML in the first place.

A recurring theme in XML development, as in life, is that common everyday things may be generally known, but still lack an agreed-upon definition: Life, love, and Web services to name three. Fortunately, we seem to get plenty done without the benefit of standardized definitions. We are able to meaningfully discuss such topics because each viewer forms his or her own personal definition based on experience. When there's enough overlap between these slightly different views, understanding may result. (If not, one or both of the communicating parties are probably at fault.)

Taking an ephemeral April First message too seriously is a classic blunder, so I'll avoid a full-scale rant. Nevertheless, "mustUnderstand" is a powerful concept. Perhaps two disparate systems exchanging, say a SOAP-wrapped UBL Invoice, are comparable to two individuals discussing the meaning of life or love (or Web services). Both parties might have different ideas about what they're actually talking about, but a sufficient amount of common ground can result in understanding. A mustUnderstand qualifier crystallizes the situation with guarantees about how much common ground truly is common.

So, musing about additional levels of mustUnderstand has an understandable allure. Many of us spend a great deal of our professional time getting various systems to "understand" each other.

Fool Me Twice

Another wile came from Microsofties Andrew Layman and Don Box, as a link to a paper entitled XML Performance Improvements Through Interdisciplinary Factor Assessment and Application, with bonus points for referencing the earlier discussion about binary XML. Performance is certainly on the mind of many XMLers, with the "off-by-one" publication of the XML Binary Characterization Working Draft, or XBC.

Another undercurrent behind the Interdisciplinary paper is of measurement and evidence, something that the XBC draft falls short on, with simple yes and no columns instead of convincing data. Len Bullard comments on the binary XML saga-in-progress: "The TAG is demanding benchmarks and test cases, something that hasn't been demanded of disruptive technologies such as HTML, RSS, CSS, XML, or even SOAP," and he later asks, "is the XML Binary a disruptive technology that will change the current landscape of technology companies?" As it turns out, in the resulting thread fear runs deep that a not-XML specification would disrupt--in a bad sense as opposed to the "good" Innovator's Dilemma sense--the existing web of XML specifications.

Steven DeRose provided insightful commentary on the situation. I highly recommend reading the entire text, which I will only summarize here, using the four main topics from his message. 1) What operations do you want to do? Perhaps the most important question to ask. For several tasks, a binary format may be faster. For others that involve working with the text stream, binary formats will incur extra conversion overhead. 2) Where is the data kept? Always something to keep in mind: disk versus RAM tradeoffs. 3) What does "lossless" mean? In "roundtrip" scenarios, various pieces of information may or may not get preserved. The Infoset is a commonly cited, yet controversial, set of goalposts for what to preserve. Transport issues such as byte-order-swapping might become significant. Finally, 4) Hybrid solutions. Certain kinds of optimizations can be done entirely within XML, for example, by adding in additional attributes that provide indexes into other parts of the document. Steve concludes that "the solution space is much wider than it may appear, and the answers are more complex. Also, that it can be, and has been, done successfully. But except for really huge documents, I don't think it's usually worth the effort."

In a curious bit of synchronicity, Bullard mentioned another perceived benefit of a binary format: that authors "don't expose their content to inspection." Paul Downey noted that just the day before, a technique to add "View Source" for Flash had been posted, inspired by a Lawrence Lessig talk at the FlashForward conference.

Won't Be Fooled Again

Besides the posting date and credible-sounding subject lines, closer inspection reveals some additional strands connecting these two bits of mischief with recent discussions.

Robustness: Elliotte Rusty Harold commented that part of the culture of textual XML processing is a certain amount of redundancy and paranoia, something shared by mustUnderstand processing.

Lossy Understanding: A primary decision point for the binary XML folks is nailing down the amount of lossiness that the format will have considering roundtrips to and from XML syntax. Authors might encode their intent with various XML facilities, and so a lossy conversion might translate into a lossy understanding.

Interop: Real and imagined levels of mustUnderstand exist to provide minimal guarantees about interoperability, a parallel to discussions of whether a binary XML format should be standardized, or left to individual implementations.

Looking ahead, what will we see next April First? Hopefully, industrious XML community members aren't planning that far ahead. Another safe bet is that the XML-Dev crowd will come up with something creative, funny, and reflective of whatever issues are burning in our collective minds next spring. Until then, we have about 11 months to brush up on our koans, limmericks, haikus, Monty Python sketches, and other creative outlets.

Births, Deaths, and Marriages

W3C Workshop on XML Schema 1.0 User Experiences

The W3C is organizing a Workshop on XML Schema 1.0 User Experiences to gather concrete reports of user experience with XML Schema 1.0, and examine the full range of usability, implementation, and interoperability problems around the specification and its test suite.
When and Where: June 21-22, 2005 Oracle Conference Center, Redwood Shores, California, USA.

XML 2005 Call for Participation

Get in early; deadline is May 13.

Nux 1.1

Nux, an open source extension of the XOM and Saxon XML libraries, is available.

Documents and Data

Python processing in XML: Point, Counterpoint, Counter counterpoint.

Andrzej Jan Taramina on the real use of SOAP/WS.

Dimitre Novatchev shows how to find the deepest node via XSLT 2.0.

Michael Kay on the secret of efficient coding.