Agile XML

August 31, 2005

"The tumult and chaos of the browser wars seem to have numbed many developers into accepting the W3C's status uncritically, but I don't know how long that acceptance will last." -- Simon St. Laurent, 1999

August brings a change in my day job responsibilities; along with it comes with a certain amount of disruption, including a stretch of time away from reading the mailing lists. Judging by the popularity of columns like XML-Deviant, many readers are in similar straits. Even high-quality lists like XML-Dev require a time investment to read, process, filter, and extract the useful bits of discussion. As the XML-Deviant column has evolved, it has shifted from a pure mailing list summary to something more focused on analysis and observation. Still, basic summaries occasionally fill an important role. Catching up on things often stirs up new discussion or uncovers interesting connections between things, so let's get started.

Starting off with a typical XML-ish discussion, Bryan Rasmussen wonders out loud about the differences between typical validation (where the smallest error causes a complete failure) and typical hypermedia processing, say in a browser (where a small error is ignored and life goes on, perhaps with a broken image indicator or some such). While not exactly a theory, Len Bullard responds with: "If your windshield wipers aren't working, should your car refuse to start?" Seems sensible to me, in most situations. In the final QA check as the vehicle leaves the factory, however, sending the car back would be better than continuing on to a dealer, though maybe I'm just picky.

Does that jive with what's going on with XHTML 2.0? Working Drafts are not coming out as fast as some would like, though they are getting looked at. The mailing list and individual bloggers were abuzz over the conformance section of the May Working Draft, which claims to require an xsi:schemaLocation attribute. Trace back far enough, and you'll find Henri Sivonen's message, elaborated on by Uche Ogbuji. Many other bloggers jumped on this, with the general tone being, "What were they drinking when they wrote this?" Fortunately, this story has a happy ending. As reported, this was a case of an RFC 2119 typo, a MUST that really meant to say MAY.

Agile XML

Jim Fuller started an interesting thread, modifying the original Manifesto for Agile software development to fit XML, the main value points being:

XML processing over text processing (batch, make, etc...)
REST over SOA (SOAP et al)
RelaxNG over XML Schema
XPath and XSLT over XQuery

Continuing: "That is, while there is value in the items on the right, we value the items on the left more."

Michael Champion countered (in a message that seemed to slip through the archive cracks):

It's not obvious to me how REST is more agile than SOA, RELAX is more agile than XSD, etc. Certainly an agile XML developer would be ready to use RELAX NG in a scenario for which it is more appropriate than XSD if that improved customer collaboration, individual interaction, etc. .... but that's almost certainly not generally true.

Additionally, Champion made a well-received analogy to home improvement tools, linked at the end of this article.

Within hours of that discussion and with no apparent connection to the Agile XML thread, Joe Fawcett asked about his assertion that "the growth of (XML) Web services was promoted by the need to separate content from presentation." Someone just coming off the previous thread might read the question as, how much affect has agility, in particular in separation of content from presentation, had on the development of web services? Michael Kay responded that the situation is "driven more by the need to deliver XML data to applications that carry out business logic using the data, rather than merely doing presentation." Others, including Jim Fuller and Anne Thomas Maines agreed.

"Separation of content from presentation" is a bullets item in all kinds of presentations dealing with XML, but a point that seldom gets looked at with a critical eye. Xasima Xirohata points this out, saying, "As far as I know it's too dangerous to say that the separation of content from presentation is advantage of XML or it's the main necessity that forced the appearance of XML." Strict separation often has benefits, but other times, the content is the presentation. Doug Rudder zeroes in on a key point, that design choices "will impact output, searching, indexing, linking, reuse, etc. In most cases, defining content for what it is, not what it looks like (in a specific output instance) is important." Which goes back to the XHTML 2.0 folks, and why they are being so careful to get things right.

Exploding Schema Processors

Roger Costello suggested that in W3C XML Schema, using maxOccurs='unbounded' is undesirable, drawing a comparison to software infinite loops. Joe English objected to this practice on the grounds that picking some value for an upper bound is arbitrary: "with very few exceptions, any attempt to devise a suitable upper bound for any 'maxOccurs' value is bound to involve wild-ass-guessery." Another angle that isn't readily apparent is that the common algorithm used to implement a state machine from a grammar exhibits really poor memory performance in the face of large-but-not-unbounded maxOccurs values. Practical experience shows that values over about 1,000 start to cause problems.

Rick Jelliffe gets into even more technical details, speculating that the problem lies in the grammars in the first place, and that path-based validation techniques are what's needed next.

But Michael Kay perhaps puts it even more succinctly: "computers shouldn't impose limits on people."

Wrapping Up

Can we draw any connections among these topics? For one, many discussions have been gradually moving up to a higher level. Instead of figuring out how XML Schema is supposed to work, folks are talking about overall validation strategies, and hammering out best practices for schema modeling. Discussions, too, about Agile XML indicate an increased level of self-awareness among participants.

August is usually a slow month on mailing lists, with all the Euroholidaying going on, but this year things are pretty lively. Lots of fresh and informative discussion continues on. Of course, some of the traffic is permathread material, or otherwise content free. For example, I haven't summarized any of the threads on "Web 2.0"; trying to figure out a good definition for it, or explaining how it's really just an attitude, or contrasting homegrown technologies with those from standards bodies. Filter out the noise from those conversations and there's nothing left. Perhaps that realization itself is the important thing to take away from the discussion summary.

Births, Deaths, and Marriages

Canadian Semantic Web Working Symposium

Get in early, submission deadline is January 1, 2006 for this conference in Quebec City.

Early version of the XML Query Test Suite

Over 7,000 tests, with thousands more coming.

Orbeon Presentation Server 3.0 Beta

A new release of this server-side XForms engine, now with Ajax power.

Amara XML Toolkit 1.0

This excellent collection of Python tools for XML reaches its first full release. Now requires nothing beyond Python for installation.

Preliminary Program for XML 2005

Late-breaking sessions can still be submitted by September 16, 2005. Conference is November 14-18 in Atlanta.

XSL-FO 1.1 Last Call Working Draft published

Comments due by September 16, 2005.

Documents and Data

Nomination for xml-dev quote of the year.

Joel on Naming Conventions