
XML 2005: Tipping Sacred Cows
If there's an overarching theme to XML Annoyances, it's a simple imperative: think! The point isn't to rant against the system, tempting though that might be. During the expansion phase of XML, early adopters converged on a specific set of practices, conventions, and "common knowledge." We'll get to a few specifics in a moment, but for now we'll affectionately call these things sacred cows.
Now, rather than technological expansion, XML is undergoing user base expansion; as a mainstream technology, XML has users of all different stripes and levels of previous experience. Lots of readers are new enough to XML to have recently run into a sacred cow or two. Without the shared formative experiences of some of the old XML pros, new users just get annoyed, usually turning to books, mailing lists, or co-workers for an explanation. All too often, the response new users get is a varyingly polite request to accept things as they are.
I disagree on principle. Often, pushing against a sacred cow yields a surprising new insight or at least a better understanding of a complex reality. With that attitude, I attended this year's major U.S. XML conference, XML 2005 in Atlanta, with a theme of "Syntax to Semantics."
Overall, the conference was less about fireworks and controversy and more about thoughtful contemplation of a maturing technology. Perusing the schedule-at-a-glance, one thing that jumps out is the sheer breadth of topics. XML is everywhere: thesauri and higher education, calendaring, health care and pharma, applications and modeling, hazardous waste management and emergency alerting protocols, financial services, and even artificial intelligence.
As Kurt Cagle writes, XML technology has become the software industry. Indeed, a number of the talks were about the integration of XML into mainstream languages. John Schneider's Wednesday talk on ECMAScript for XML (E4X) was a prime example. Available already in recent builds of Firefox and Actionscript 3, among other places, the language includes native XML support, enabling code like the following:
<script type="text/javascript; e4x=1">
var x = <a> <b>Hello</b> <c>world!</c> </a>;
alert(x.b); // shows "Hello"
</script>
There are lots of additional features and conveniences. I highly recommend that JavaScript programmers take a look. Other languages are moving in a similar direction. Several presentations talked about XML features added to VB9, C#, and Java.
Another feature of this year's conference was a noticeably reduced level of anti-W3C XML Schema ranting, both in sessions and in the hallways. Bob DuCharme's Wednesday talk Your Schema and the Industry-Standard Schema included overviews of both W3C XML Schema and the Relax NG language. Both have their place. Tommie Usdin's Thursday talk W3C XML Schema, RELAX NG, Schematron, or DTD: How's a User to Choose? went even further, forcefully arguing that individuals and individual projects are better off pursing a multiple-schema language strategy from the start.
Querying: One Language or Three?
Which brings us to one of our sacred cows: for decades we've had SQL for relational databases, and soon we'll have XQuery for general XML, and SPARQL for RDF. (For more about SPARQL, check out Leigh Dodds' tutorial.) Commonly accepted wisdom is that these different query systems, each tightly coupled to its underlying data model, are all necessary rather than redundant. Erik Meijer's Wednesday talk XMP Programming Refactored (The Return of the Monoids), after pointing out that DOM in Dutch means "brain-dead," went on to describe the advantages of adding XML directly into programming languages, much as described above. Then the payoff: what if it was possible to construct a generalized query language, loosely coupled enough to work with any underlying data model? The mathematical basis for this was monoids. The presentation didn't actually define this fairly abstract term, only skipping from trivial examples like <Integer, +, 0> or <Boolean, AndAlso, True> to a fully worked representation of a generalized query. Erik's dynamic presentation style is such that I was not able to copy down the full example before he had moved on to the next slide. Whatever the details, it's a valuable topic in that it gets listeners to question their assumptions and see in new ways.
On the other hand, Jim Melton's talk later the same day, SQL, XQuery, and SPARQL: What's Wrong with This Picture? offered a counterpoint. Jim elaborated some of the different assumptions underlying queries against a relational system as opposed to an RDF triple store. His conclusion, applying his vast experience to the situation: SPARQL is not Yet Another Query Language. It has its role and purpose.
XML Infrastructure Developments
In his keynote, Microsoft's Soumitra Sengupta posited that innovation tends to migrate above the level where convergence happens. Coverage of ongoing work on top of the XML core certainly demonstrated these kinds of developments.
Joe Gregorio's Tuesday talk The Atom Publishing Protocol: Publishing Web Content with XML and HTTP gave an overview of protocols, starting with XML-RPC while "the ink on XML was barely dry," through other HTTP POST-centric protocols, up through Atom, now a proposed standard as of August 2005. The key advantage of the Atom protocol comes through the proper use of HTTP, including GET, PUT, and DELETE.
Jon Bosak's Thursday talk UBL Update went over the latest developments around the Universal Business Language. By defining the details of common business payloads, UBL seeks not to replace entrenched EDI systems which have a transaction cost of around $5, but rather paper-systems which have a transaction cost around $30, especially when error-prone re-keying is involved. He called UBL an exercise in "brute standardization," just getting the necessary parties together then hammering out an agreement on the details. UBL 1.0 has been final since November 2004, and the first round of 2.0 schemas will come out before 2006. On the deployment front, Denmark now mandates UBL, saving hundreds of millions of Euros per year, and Sweden's rollout isn't far behind.
A huge audience packed the room to see Brian Jones of Microsoft give the Thursday session Microsoft Office Open XML Formats. The new Office 12 will use the new zipped XML formats by default, with new extensions: .docx instead of .doc, .pptx instead of .ppt, .xlsx instead of .xsl, and so on. Microsoft will provide back-patches for file compatibility for Office versions back to Office 2000. It turns out that the zip format adds a great deal to file robustness, due to the way that files are stored internally. As a demo, Brian used a hex editor to chop the last few thousand bytes off a *.docx file, which corrupted it as a zip file, but Office was still able to recover all the primary content and some of the styles. The session didn't discuss any potential licensing issues with the formats.
Other interesting developments are happening around microformats, a topic previously covered by XML Annoyances. The W3C's Dan Connolly presented on Wednesday Semantic Web Calendaring: RDF Calendar, hCalendar, and GRDDL. One benefit of microformats is that the information they contain can be readily transformed into RDF statements, and the GRDDL specification provides instructions on how to convert ordinary XML into RDF.
Kurt Cagle's Thursday talk Binding the Graphical Web (Component and Data Bindings with XBL, XHTML and SVG) covered past and present developments around various XML binding technologies, including a thoughtful discussion on different classes of abstraction. Key quote: "all programming is a metaphor."
Pages: 1, 2 |