Caveat Incumbent

July 28, 2004

Edd Dumbill

In this week's XML-Deviant, we take a look at two conversations on the XML-DEV mailing list that highlight XML's disruptive aspect -- more specifically, the disturbance XML can cause to the dominant incumbent in a technology area in which XML is being introduced.


XHTML is the evil/liberating tool of a bunch of religious maniacs/lazy developers/visionaries intent on confusing/improving/revolutionizing the lives of hapless/worthy/troublesome web developers, aided and abetted by broken/customer-supporting/pragmatic web-browser implementations.

Or so we learn in a thread started off by Len Bullard about embedding XML in HTML and the rules for its processing. More entertaining than the mechanics of Len's trouble was the bile spilled onto the list from XHTML-haters.

It's an interesting question: should web browsers really be processing XML? Isn't their domain the broken mishmash we call HTML? Leave it to other applications to deal with XML. Joshua Allen speaks, and opens the case for the prosecution of XHTML:

I'm really suspicious of calls to rewrite a better, XML-aware browser. "Better" is always in theory; in practice you end up with tons of bugs and unintended consequences. XHTML purists have had a number of years to prove their thesis, and all they have proved is that the "pure" way results in yet more complexity for web developers, no appreciable benefit, and new bugs.

It's tough for Microsoft employees such as Allen to mention bugs, as Elliotte Rusty Harold points out:

I'm not sure what thesis you think the so-called XHTML purists are trying to prove, but the new bugs seem mostly to be in Microsoft products and the benefits are quite clear.

What benefits? Harold goes on to argue the benefits of content that is both human and machine readable.

XHTML vastly simplifies machine processing for all sorts of purposes. For instance, I could not generate RSS feeds for my web sites if they were not well-formed XML ... if more data were available in well-formed HTML, I could do more cool things with it. For instance, why should Amazon/Google/eBay, etc. have to provide separate interfaces to the same data for web services and for browsers? Why can't one suffice for both? If they were using XHTML or XML, instead of HTML, one set of pages would serve double duty.

Ah, screen-scraping. "Dubious at best," says Dare Obasanjo. Worse, cries Allen, it's narcissism! Laziness might be nearer to the mark. Allen continues to the heart of his complaint:

For people who really want re-purposable data; we already have capability to do XML+XSLT+CSS. My RSS feed and OPML feed are both pure XML (no XHTML crap) and render nicely in IE and Mozilla. XHTML is a Frankenstein.

Mike Champion, who if he ever founds a monastic order will be held in history as St. Michael of the Pragmatists, agrees that developers are lazy, but suggests that this might still be preferable:

It's amazing how much ingenuity goes into doing useful things with tag soup and minimal metadata, but how much real benefit we get from all that lousy HTML via browsers, search engines, script applications, etc. But maybe it is more globally efficient to have a small group of developers learn how to make sense out of tag soup than to force the masses to deal with the very real pain of full standards compliance.

He goes on to address what is probably the root of the argument, differing opinions on what XHTML is actually for:

Still, the point of XHTML is not so much to be a stopgap but to bring some rigor to content so that all sorts of XML technology can be thrown at it. Screen-scraping is just one use case, there's also querying, transformation, syndication, content re-use (without worrying about the HTML escaping hassles), web-services enablement ... for just about every XML infrastructure spec, there's a plausible scenario in which having web content in XHTML enables all sorts of interesting things without relying on tidy or a tag soup parser to build a clean syntax or info-set.

It's time to see some give and take, says Champion, and realign some of the W3C Recommendations with reality -- a "refactoring" of the specs to accept real-world concerns and also to reinvigorate the vision enabled by standardization.

Does ebXML Simplify EDI?

"There's no XML in complexity" is the fallacious cry we often hear from eager XML adopters embarking upon the re-engineering of their domain in XML. One user is certainly suspicious of these claims about the electronic business technologies ebXML and EDI.

Is it really a fallacy? I for one was convinced by ebXML advocates at XML conferences who lauded the increased equality of opportunity that ebXML afforded.

Peter Hunsberger indicates that there's little advantage for existing EDI users with the XML-ification of EDI. Not too surprising. But he also points out that ebXML's lower cost of implementation is a key factor:

Bottom line, for me, was that if you already had a (good) EDI exchange mechanism in place there was little benefit to XML. To clarify my parenthetic qualification; it depended a lot on your tools, many were hard to adopt to new business areas. However, the lower cost of entry for XML-based technologies allows it to displace EDI technologies from the bottom up and horizontally.

Whether the assertion that ebXML opens up electronic trading to smaller businesses is still contested, though. I'm sure we'd all like to think it does, but it may not be in simplicity of specification that it achieves its aims. According to Dale Moberg:

There was originally some hope that ebXML specifications could be used to produce a solution more attractive to small- and medium-sized businesses that would be less trouble to use ... Actually ebXML ended up defining functionality beyond what is usually defined by EDI standards. So from that standpoint, it is not simpler. Whether it promotes simpler solutions for end users is debatable, but there are vendors pursuing simplified ways to make use of ebXML under the covers (of forms) so that the complexity of ebXML is largely concealed from end users. I think it is safe to say that it is probably not simpler for implementers (by implementers, I mean the software vendors or open-source providers, not the end-user deployers).

Additionally, though not mentioned in this debate, one large factor in ebXML's favor is the ability to function over the Internet, not expensive closed networks. With both ebXML and XHTML, simplification isn't really the touchstone, it's more about future opportunity.

Births, Deaths, Marriages

The latest announcements from XML-DEV.


Lesser-known Java XML parser returns from the dead, fixing bugs, improving performance, and moving to an Apache license.

Oxygen XML Editor 4.2

Commercial schema-aware XML editor and XSLT editor/debugger. New features include presentation of schema information while browsing and editing an XML document.

Second Semantic Technologies for eGov Conference

Call for papers for conference focused on using semantic web technologies for e-government.

Mark Logic XML Query Engine

Free license to run a 50MB-limited version of the XQuery engine available.

RFC 3023 Redux

Death to text/xml! XPointer added as a fragment identifier for application/xml. XBase recognized for specifying base URIs.

XMLOpen 2004 Program Available

UK-based XML and open source conference. Star speaker line-up, with plenary sessions from Rick Jelliffe, Jeni Tennison, and Sean McGrath.

XQuery and XSLT Interim Working Drafts

Incorporate changes made so far due to issues received from the Last Call working drafts.

SEC Initiative to Assess Benefits of Tagged Data in Commission Filings

Will the SEC accept filings in XBRL?

Also in XML-Deviant

The More Things Change

Agile XML


Apple Watch

Life After Ajax?

XML for Binary Interchange, Addressing Machine-to-Machine Interoperability & Tactical and Mobile Computing

Ironically, space is limited at the conference with the longest title ever.


World-Wide-Wait reimplemented with SOAP ... 2.5 hours of solitude ... Mails to XML-DEV last week 128, Len rating 10% ... Riddle me this -- one of these namespace technologies is not like the other ... Stumbling up against XML-DEV's anti-verbosity measures ... you know XML has made it when ... it takes off like this.