What Next, XML?

February 16, 2005

Micah Dubinko

Edd Dumbill, who's been writing this column since stepping down as Managing Editor, has decided to move on to other pursuits, in which we wish him the very best good fortune. I've asked Micah Dubinko, an XForms expert and longtime XML developer, to take over the helm of the XML-Deviant column. -- Editor

"What next, what next? At every corner I meet my Father, my age, still alive." -- Robert Lowell

You won't get far into any XML discussion without somebody mentioning the roots of XML. XML is cleaned up, stripped down version of SGML that keeps most of the good stuff and gets rid of most of the parts that make coffee.

Such thoughts naturally lead to the question of what comes after XML. In an Advogato posting, the W3C's XML Activity Lead, Liam Quin, asked 'what next?':

Should we change XML?

1. People writing software and representing structured information (whether it's a configuration file or documentation or data) -- if you're not using XML, what's stopping you?

2. People using XML: what are the edge cases, the limits, the places where you've tried to push XML and failed?

3. What (if anything) should we change?

The xml-dev list quickly picked up on this discussion in a thread ("xml 2.0 -- so it's on the way after all?") started by David Lyon's enthusiastic offer to help develop this new specification. The discussion quickly delved into participant's pet peeves over XML syntax, which prompted a response from Liam Quin: "XML is a subset (more or less) or "profile" of an older ISO standard called SGML." Only 11 messages into the thread, and the ghost of SGML rears its head. Sounds about right.

Admittedly, Quin's job is complicated by the underachievment of XML 1.1, which suffered from a lengthy development cycle and continues to suffer from lukewarm adoption. XML has become such a core technology that incremental improvements are difficult to justify. The transition from SGML to XML reinforces this notion; mainstays like HyTime and DSSSL couldn't just hop over from the SGML core to the XML core. Things had to change enough to make a clean break. This led to the success of today's XML but caused its own set of problems.

How much of a clean break would a transit to XML 2.0 need? What parts should stay, and what parts should go? According to xml-dev participants, the two hot-button issues are DTDs and human readability. Norman Walsh addresses both: "I don't think XML 2.0 parser writers will have to worry about getting parameter entity expansion right. I'd be stunned if any DTD syntax survived any revision of XML"; later Walsh added that "I don't think an XML 2.0 is worth pursuing" without human-author features like entities.

Can we divine any leading indicators from the standards bodies? As far as DTDs go, the W3C seems to be backing away, with an xml:id Candidate Recommendation that replaces one of the dwindling number of use cases for the venerable DTD. Human readability, however, is a much more complicated situation. There, the W3C has formed a Binary Characterization Working Group to study areas where XML is almost, but not quite, good enough.

Regarding so-called "binary XML," respondents are sharply divided. Microsoft's Michael Champion floats this viewpoint:

I'm sure this is anathema on this list, but hand-authored XML is just not a mainstream use case anymore, and it's going to be harder and harder to make a business case for keeping around the stuff (half the productions, I'll guess?) that exist just to facilitate it.

Fellow columnist Uche Ogbji fires back:

If you insist on this "hand-written XML is obsolete" theory enough to use it as a case for focusing XML 2.0 on toolkits, you're going to have to come up with a lot of evidence to back up your dubious claim.

A brilliant analysis of the situation comes from Derek Denney-Brown, responding to an equally on-target Eliotte Rusty Harold, whose points were:

1. Patents are beginning to invade this space, closing off interoperability and open software.

2. The data that's transmitted in this binary format is less inspectable than data in the regular XML format.

3. Software vendors will publish tools that only consume the binary data; and therefore systems will refuse to accept the textual data.

4. Binary parsers often forgo well-formedness checks such as name characters that textual parsers make. They incorrectly assume that nobody can or will inject broken data into the system.

Denney-Brown's responses were:

1. All the more reason to create a decent standard before the space is completely closed off, barring entry of useful, innovative ideas. Easiest way to block a patent is to publish the same idea openly first.

2. True. In my actual experience with binary-xml, this is possibly the single largest weakness. Then again, by tying binary-xml to XML, you gain the significantly improved possibility of being able to alternatively send the data as text-xml. A critical component of any binary-xml design should be to emphasize that binary-xml is just an optimization of the normal text-xml path. This is a key reason why I think binary-_X_M_L_ is actually better than ASN.1 or some new format.

3. Just as today there are vendors who only consume ASCII or UTF-8 XML. This relates back to 2, and to general market pressures. If the market really only wants binary-xml, then let it go. I tend to believe that most markets will quickly see the value in being able to choose text or binary, depending on the current needs. Are you debugging or still building the system? You probably want text-xml (back to your point #2). Are you focused on tuning your system for absolute throughput, choose binary.

4. I am in complete agreement with you here as well, but this can be addressed (and should be) by conformance tests and clear requirements within the specification.

Suggested binary formats tend to fall into two categories: those that have a strict one-to-one mapping to XML (or the XML Infoset, depending on your upbringing) and those that make departures, such as IEEE floating point, in the name of efficiency. The use cases published by the W3C have some of both. The pro-binary camp would do well to decide one way or the other if they hope to make much progress.

Some final food for thought: SGML was developed largely through ISO, but XML came about through the efforts of a newer and fleeter organization, the W3C. It could happen that the thing that eventually displaces XML comes from an organization newer and fleeter still. Often the future comes from an unexpected direction. Ron Bourret has similar thoughts as he refers to an earlier message about using XPath in the Unix find utility, calling it a "nice reuse of technology that came out of XML in a non-XML environment".

What next XML? Maybe the ways it will be similar to XML-as-we-know-it will be as surprising as the ways it will be different.

Births, Deaths, and Marriages

Announcements from the XML world since last XML-Deviant.

XTech 2005 Keynotes

Keynote speakers announced for this year's XTech conference, May 24-27 in Amsterdam. Speakers include Paula Le Dieu of the BBC, Jean Paoli of Microsoft, and Mike Shaver of the Mozilla Foundation and Oracle.

Call for Proposals: International Workshop on Topic Map Research and Applications

Abstracts for this conference in Leipzig, Germany are due May 2, 2005.

Call for Participation: Patterns in XML ChiliPLoP 2005 Hot Topic

ChiliPLoP looks to be an interesting Pattern Languages conference, this year in Carefree, AZ (an hour out from Phoenix) March 22-25. One of this year's hot topics is XML Patterns.

Call for W3C Tech Plenary Lightning Talks

Paul Downey invites anyone planning to attend this public W3C event to propose a lightning talk to him directly.

XML Standards Library 2.1

A compilation of Windows-format help files covering the full spectrum of W3C standards. Quite useful for those times you happen to be on a Windows computer.

W3C Working Group Note: Extending XLink 1.0

A short note containing some useful changes that could potentially appear in XLink 1.1 Specification.

xml-hypertext List Shutting Down

It was fun while it lasted.

Final Committee Draft of NVDL FCD

Makoto Murata announces the final committee draft of Namespace-based Validation Dispatching Language (ISO/IEC FCD 19757-4). NVDL is a language for dividing a multi-namespace document into single-namespace fragments and then invoking validators for these fragments, a topic near and dear to my heart.

Extreme Markup Languages 2005 Call for Participation

This year's Extreme Markup Conference is August 1-5 in Montréal. Peer Review Applications and Tutorial Submissions are due 25 March, and Paper Submission are due 15 April.

XML-Writer: Requests?

David Megginson asks for suggestions for the XML-Writer project. "I don't plan to rewrite it into something different than it is, but I am interested in hearing people's suggestions for fixes and minor enhancements or updates."

New and Updated Tools

nux-1.0, open-source extension of the XOM and Saxon XML libraries;
PsychoPath 0.1, new and curiously named implementation of XPath 2.0 including all the Schema bits;
Amara 0.9.4, more updates to Uche Ogbuji's Pythonic XML toolkit;
Rx4RDF and Rhizome 0.4.3, RDF application stack in Python, integrated with 4Suite;
Mvp.Xml v1.0, includes XInclude and XPointer modules;
Stylus Studio 6, XML IDE with (EDI)-to-XML mapping;
oXygen XML Editor 5.1, general-purpose XML environment;
Altova DiffDog 2005, XML diff;
Fast Infoset, an early Java implementation of not-called-XML;
GNU JAXP 1.3, a libre implementation of standard XML processing APIs for Java;
TMCore05, a Topic Map engine for .NET.

Documents and Data

Also in XML-Deviant

The More Things Change

Agile XML


Apple Watch

Life After Ajax?

Various quotations and statistics from recent XML list activity.

"Sensible design, sensible question, but can't be done with the XML as described, sorry."

59% of UK businesses using XML.

Divine guidance?

Over on xsl-list, Michael Kay posts a nice explanation of some thorny namespace+URI issues.

Back of the envelope for xml-dev (Feb 1-13): 315 posts by 73 posters, 87831 words, 5127 sentences, overall Flesch-Kincaid readability grade level: 10.99.

Puzzler? Too easy for these guys.