February 25, 2004
For the past two months nearly all of my XML.com writing time has been spent combing through the main text, as well as the detritus and ephemera, of the W3C TAG's Architecture of the World Wide Web -- work which resulted in my writing seven XML-Deviant columns about this fundamentally important document.
But that work also meant that I've not been paying very careful attention to the XML-DEV mailing list or the XML developer community. What fun I have missed! Accordingly, in this XML-Deviant column I want to catch up with some of the more interesting developments, most notably the RDDL2 and genx development efforts.
Riddle Me This!
RDDL is one of those interesting projects which people often misunderstand, for at least two reasons: first, it's an attempt to respond to the XML namespace document issue, an essentially contested bit of XML technology; second, the XML developer community drives the development of RDDL, which is to say that it's not the product of crass commercialism, such that there are marketing flaks around to beat it into everyone's head.
RDDL provides a kind of XML document suitable to put at the end of an XML namespace URI, a document which describes, by means of typed links, a bundle -- schemas, transformations, even bits and bobs of code -- of related resources in both human and machine readable ways.
In the middle of January, Jonathan Borden (who edits the RDDL specification, together with Tim Bray) informed the XML Developer community that a new (draft, it seems) version of RDDL, version 2.0, had been posted. Borden highlighted the major change, that the new version incorporated a new syntax, not indebted to XLink, cooked up by Tim Bray. The new syntax, Borden claimed, had been unopposed for six months, was simpler, and seemed motivated by XLink's failure to "achieve traction".
Eric van der Vlist responded to Borden's note by implying, first, that the TAG is an odd place for specifications to be developed (though it's not clear that the TAG cannot develop informal specifications, since everyone seems to have that right); and, second, that the new syntax seemed to van der Vlist rather broken. Van der Vlist's objections to the new syntax center mostly on what he takes to be a loss of expressivity. "... I use [RDDL] as I have shown in my examples (a rddl:resource embedding a whole <div/>)," van der Vlist said, "often with several links and even if the upgrade [to the newly proposed 2.0 version] would be feasible through a XSLT transformation, I consider that it would decrease the expressiveness of the links."
Further, as van der Vlist said, "The difference with the current syntax seems pretty much limited to a boycott of XLink which doesn't seem to be a benefit by itself to me..." Boycotts of XLink, if that's what's going on here, may be useful in or to some contexts, but they do seem awfully curious as an iteration of the RDDL specification. Further, as van der Vlist also points out, it's a strange thing to claim as a benefit of a new version of the RDDL specification, especially when there is some doubt as to how unused XLink really is. John Cowan, Andy Greener, and van der Vlist named four XML applications -- XTM (Topic Maps), XBRL, SVG, and OpenOffice's document formats -- as examples of XLink adoption. I'll remind you, dear reader, that XLink has been a political weight inside the W3C for some time by pointing you to three XML.com articles: Micah Dubinko's "A Hyperlink Offering" and two pieces of mine: "Introducing HLink" and "TAG Rejects HLink".
Though XLink's success and utility are matters of some dispute, Simon St. Laurent offered a list of reasons why it hasn't been more widely adopted. Those reasons include XLink taking too long to arrive, being orphaned by Microsoft (and by most browser developers, too), the belatedness and complexity of XPointer, the overlap with both RDF and XTM, and, perhaps most importantly, "most people still don't get/want multi-ended links". Jeff Rafter added two more reasons: the political dustup over HLink and XPointer's IPR interactions with a patent of Sun's -- interactions which, in Rafter's estimation, may have "spooked a lot of implementers". Van der Vlist also noted that XLink bore very high expectations (which is true historically; recall that the original W3C troika was XML, XSL, and XLink) and the fact that extended links are too complex for many users.
Tim Bray responded to this "grumbling", offering some background for RDDL 2.0's changes, including three problems with RDDL 1.0. First, the "nature" and "purpose" attributes were mislabeled. Second, it was "abusing" the "semantics of the XLink spec...pretty severely". Third, it duplicated some bits of its host specification, HTML. Bray conceded van der Vlist's objection that RDDL 2.0 is less expressive than the previous version.
Discussions seem to be ongoing over the future direction, if any, of RDDL 2.0. I think it's useful to point out that what happened here, while perhaps not ideal, is encouraging. Some changes to a conceptually useful, even if underused, specification were proposed, and some of the core users of that specification pointed out that the changes weren't worth their costs. All of this prompted further discussion and, presumably, development. That's how these things should work.
The second, interesting bit of work on XML-DEV in the past few months is the C library, genx, for generating XML. We've all struggled with the progression of XML's complexity, and one scene of that struggle has been the increasing complexity of means of generating XML programmatically. Yes, many people still write XML by hand, not just Simon St. Laurent -- I know people who write RDF and OWL by hand. But lots of XML these days, especially in web service contexts, gets automatically generated by some machine process. We probably all started out, back in the day, by generating XML with string printing. But as the I18N and C14N burdens have increased, as Uche Ogbuji has been demonstrating in his Python-XML columns of late, generating XML in that way has become too error prone.
Also in XML-Deviant
As a result of some discussion which sprang out of the Atom community, Tim Bray stated an interest in developing a C library for generating well-formed, canonicalized XML efficiently. He presented an initial design, which subsequently went through several rounds of refinement and change, driven in large part by input from members of the XML development community, largely on XML-DEV. One of the interesting tensions in the use case analysis was the issue of generating Canonical XML, very useful in web services, versus being able to add a DOCTYPE declaration, required to be able to generate valid XHTML. Canonical XML and XHTML are both in the fat parts of the curve of use cases, though they are different curves. But, then again, they aren't. One can imagine needing a way to generate canonicalized XML for a web service but also wanting to generate, in the context of the same service, valid XHTML for human consumption. Sure, you can use two different libraries or do some post-processing, but the tensions and interactions between these core use cases is interesting.
I won't summarize all the contributions of the individual participants -- that's both too much work and too much low-level technical detail, a great deal of which concerned portable C practices anyway. I do commend the conversation threads to you for casual reading. And, of course, the code is available (Bray says, "the plan is to grant essentially unlimited Open-Source rights along the lines of recent Apache copyrights") for use and study and critical feedback.
Again, I'd like to point out that this discussion and development, while perhaps not ideal, is another good example of useful software getting created in a way that's responsive to community needs and interests. I'm not entirely sure that the whole episode wasn't the geek's version of a bar bet, but it is a handy thing to have in an increasingly tricky area.