The 12 Days of XML Christmas

December 27, 2000

Leigh Dodds

As the year 2000 draws to a close, the XML-Deviant looks back at the year's events in the XML developer community, bringing you the 12 Days of XML Christmas.

On the Twelfth day of Xmas my mailer brought to me...

Twelve Contentious Topics

The great thing about the XML community is that it's not afraid to face up to the hard questions; XML-DEV being particularly vociferous. This year has been no different, and we've seen a number of contentious debates.

With the year barely started, Simon St. Laurent lit the fires under the first debate by suggesting that XML-DEV should explore alternatives to the W3C (see "High Drama"). Not to be outdone, Michael Champion began a discussion on lack of clarity in many W3C specifications. Both of these are topics were revisited several times throughout the year. (In fact the Deviant suspects that St. Laurent and Champion may have a wager on who can start the longest running threads on XML-DEV.)

St. Laurent later focused on the lack of working group feedback on comments posted to the W3C mailing lists.

I probably shouldn't get this started, but I've never received an official reply to any specific comments I've posted regarding W3C specs from anyone involved in their development. (Mostly I haven't even gotten unofficial replies.)

Other XML-DEV members, keen to get in on the game, chose other promising topics. Greg FitzPatrick singled out RDF for some close attention.

Despite who or what is stupid, I guess I am not as brave as the kid who called the king naked, in saying that the syntax and model specifications are not the documents they should be if we are going to win converts to the RDF cause.

FitzPatrick selected his topic well: the perceived complexity of RDF has always been something of a concern in the XML community. Having finally beaten back the flames long enough to capture the discussion in "Being Resourceful", the Deviant had to duck for cover as Don Park stepped up to the plate. A past master at inciting heated debates, Park was last year's winner of the longest running XML-DEV thread with his brilliantly executed SML debate. Noting that the SVG specification had been recently praised on XML-DEV, Park believed that

... it is now time for a bit of roasting to see if any loose parts falls off. If SVG is so great, I think it deserves some more peer review. If you see anything you do not like in SVG, [it's] time to speak up before it gets copper-plated.

The resulting discussion was actually very civil and provided some useful insight into the design choices made by the SVG Working Group.

Even the normally implacable Peter Murray-Rust made a bid for the hottest thread award with his comments on the unreliable behavior of many parsers.

I still believe that undefined parser behavior is going to be a major deterrent to may people who want to take up XML. I have posted on this before. I am going to keep on about it. The most common reaction I seem to have so far is "Well that's how XML behaves - it's *your* problem to decide how to process XML". This isn't good enough.

See "Filling in the Gaps" for the outcome which included a free ranging discussion about parser behavior, manifests, catalogs, and a revision to the XML 1.0 standard. Many of the issues raised are still not adequately addressed.

Having warmed up, St. Laurent next turned his attention to compression of XML documents, summarized in "Good Things Come in Small Packages."

I'm starting to get concerned about the volume of complaints I'm getting from readers and folks in Web development forums who are starting to argue that XML's verbosity is a problem, especially for things like transmitting vector graphics information. There are a lot of wasted bits in XML documents - and of course in HTML and other text documents as well.

Michael Champion parried with the best titled posting of the year: "Pontifications on the Perversity of Pedantry, Punditry, and Purple Prose," again on the clarity of XML specifications, and the often obscure terminology they employ.

Is there any chance that future versions of the XML Recommendation and/or the InfoSet will deprecate the weird terminology in favor of the conventional language of mathematics/software engineering? ... e.g., call a tag a "tag" and an "XML element information item" a "node" like God intended? ;~) Will anyone lobby/vote against future Proposed Recommendations until they are written in a language comprehensible to ordinary mortals who have not labored in the mines of SGML/XML for years?

July saw another brilliant play from St. Laurent, this time suggesting that the XML process was moving too fast and perhaps it was time to slow down and take stock.

I've spent the last few years writing for and teaching folks outside of the core community, and I'm starting to wonder if maybe it's time for the core to slow down, take a look around, and figure out why more people aren't using all the tools - even the stable ones - we're providing.

Paul Abrahams then took a turn in the flame game. He asked "Why the Infoset?" Again this was to prove a useful discussion as it provided some background on the usefulness and need for the XML Infoset (see "Investigating the Infoset").

However, unless St. Laurent is able to come up with some hot topics within the next week, Michael Champion may have won this year's race, having started two further debates. The first was part of an ongoing discussion on XML-DEV concerning the rush to standardize and the various standards bodies involved. Champion suggested that realistic proposals should be made to the W3C concerning its activities and its relationship with the developer community.

I think it would be great if this group could help the XML community as a whole sort out what it wants from the W3C, what it can realistically DEMAND of the W3C, and what it must find elsewhere. If some reasonable consensus emerges, perhaps it could be quasi-formally submitted to the W3C in some form.

Champion then graciously secured his victory over St. Laurent, suggesting feedback on the relative merits of procedural and declarative transformations of XML documents.

I don't want to open too big a can of worms, but I'd appreciate any pointers to background information that might help me understand the pros and cons, appropriate use cases, etc. for the alternative approaches to transforming XML (either to a display format or another XML format).

Well done, Champion.

Laughing aside, the availability of open forums in which design issues, best practices, and community processes can be discussed is one of the greatest assets of the XML community. Contributors such as St. Laurent and Champion who are willing to ask the hard questions are just as invaluable. Although I suspect that St. Laurent's idea of controversy training courses may be going too far.

Eleven Candidate Recommendations

The Candidate Recommendation phase, introduced at the end of last year, has now become firmly embedded into the W3C process. More than eleven specifications are now at Candidate Recommendation status, including XPointer, XLink, XML Signature, MathML, RDF Schema and XML Base.

Two Candidate Recommendations received particular attention this year. The XSL Formatting Objects specification has gone through several revisions, but there are still some concerns over its applicability. The XSL versus CSS debate resurfaced this summer, one year after the original furor.

The Scalable Vector Graphics specification also prompted some discussion. XML-DEV gave the specification a warm welcome, although (as noted above) it did take time to give it a "roasting" just in case. SVG developers who were originally frustrated at the prolonged Last Call phase are no doubt glad of the movement to Candidate Recommendation.

Ten Bullard Soundbites

Of the many markup pundits on XML-DEV, Len Bullard stands out for his quirky postings and sense of humor; the Deviant has had cause to grin on more than one occasion. (Although on as many other occasions the reaction has been "huh!?!"). As a treat, here are some Bullard soundbites collected from a year's worth of XML-DEV postings.

Bullard on HTML:

One might say HTML is a jazz of markup; well-defined but loosely structured and capable of rendering many styles as a result. On the other hand, hard to predict and sometimes bizarre in the final rendering.

Bullard on SVG:

Considering that using markup for graphics used to be a killing offense, we have come a long ways.

Bullard on XML Schemas

...the extensibility of schemas is superior at the cost of verbosity and complexity. If you can handle those, and once the schema becomes a final recommendation, it is the way to go.

Two years ago I would have bitten my arm off before admitting that.

Bullard on the wonder of 'Internet Time':

The first idiot that mutters "Internet Time", take this to the bank, you get us into the messes.

Bullard on XMLs heritage:

...[it's] as if XML burgled the house of SGML, but took the TV and left the diamond pendant on the kitchen counter.

Bullard on SGML-bashing:

I am starting a web site where XML newcomers can purchase a set of ceramic letters "ISO SGML". Before every XML presentation they make, they take a rubber mallet and smash it then tell their audience they want to get past the traditional bashing quickly and get on to the good parts. That has to be funnier than "SGML is just a pain in the ass" as a recent speaker here told his audience.

Bullard on the success of XML:

XML succeeded wildly in the same way a tidal wave suddenly rears up at a shoreline after traveling hundreds of miles with barely a ripple on the surface. When the environment finally narrowed, the power of concepts that had been moving forward for three decades created quite a tall and sudden emergence, but not a surprising one. That it is sweeping a lot of developments away is not unexpected because that is what lexical unification is about: simplify the framework and reduce complexity.

Bullard on learning XML:

Telling people that XML is just a syntax is like telling people they can learn all they need to know about music by understanding one note completely. It is true, but it turns out, there is a lot to understand about one note.

Bullard on the bursting of the XML hype balloon:

Just as the HTMLers spent time kicking the bejeebers out of the SGMLers in the early daze to make a place for themselves at the table of the cybergodz, now the XMLers are getting kicked around too. It's the Web Way. Gotta love this new eCONomy.

Bullard on the Semantic Web:

Semantic web my behind. I just want to order a pizza, not have mozzarella explained to me.

Nine XML Conferences

The other great thing about the XML industry is that there are so many conferences, meaning you can soon accumulate a healthy balance of frequent-flier points.

There have been a number of notable conferences this year, including

As always, community meetings have been a big feature at all conferences, giving chance for developers to finally meet face to face.

Topic Maps had a significant presence at many of these conferences. Rumor has it that many of the conferences will be renamed to reflect this: Topic Maps 2001, Extreme Topics, Topic World, and so on.

Eight Recent Recommendations

As well as the raft of Candidate Recommendations, this year saw the release of a number of W3C Recommendations. The bulk of these were the five DOM Level 2 specifications which reflects the decision to modularize the increasingly bulky DOM API into manageable chunks.

Notably the XML 1.0 specification was updated to include errata and clarifications which have been collected since the specification was originally produced in 1998.

The XHTML 1.0 specification seems to be languishing as few web developers are rushing to adopt the standard (see "Gentrifying the Web").

Seven Mozilla Milestones

Including the release of the Netscape 6 beta, there were seven milestones in the development of Mozilla. Developed as a web browser, but branded by many as an application platform, Mozilla ties together a number of XML technologies, moving us closer to the extensible XML browser developers have been discussing for some time.

As well as strong support for CSS, Mozilla includes an SVG module, an XSLT transformation component (TransforMiix), and integrated support for RDF. The latter allows the browser to properly exploit RDF. Among the most interesting innovations in Mozilla is XUL, an XML language for building user interfaces. The ability to construct interface components in this way, and populate them from RDF sources, makes Mozilla stand out from its elder sibling Netscape and its rival Internet Explorer.

Six Apache Projects

It's been a good year for the Apache XML project. Since its inception late last year the organization has expanded to feature six major projects: Xerces, Xalan, FOP, Cocoon, SOAP and Batik. Batik is a notable new arrival that adds an SVG toolkit to the Apache arsenal.

All of the projects appear to be progressing well, although some refactoring of Cocoon and Xalan has taken place to mitigate performance problems. The unique combination of code and developers from IBM, Sun, and open source groups has presented some interesting challenges. Apache appears to be further cementing its position as neutral territory between many of the big commercial players, as well as a provider of quality software.

Five U-R-Is

Uniform Resource Identifiers (URIs) were probably the most hotly debated issue of this year. The scene was set after a W3C leak suggested that internal disputes were holding up progress on several specifications. Responding to this leak, W3C Director Tim Berners-Lee set up the public xml-uri mailing list to discuss the issues. XML developers around the world promptly disappeared beneath the weight of 1700 messages posted to the list during May and June.

The initial cause of the debate was the use of relative URIs within namespace identifiers. Disagreements revolved around whether this was legal and, if not, how to properly deprecate their use. The debate soon spread to include namespaces and URIs in general: both fundamental underpinnings for XML and Internet technologies.

The final result was that relative URIs were deprecated within namespace identifiers. Many of the other issues were not adequately addressed and are likely to cause debate for some time to come. This was demonstrated recently when the issue of naming and identifying XML resources resurfaced on XML-DEV.

Four Corporate Heavies

IBM, Oracle, Microsoft, and Sun all jumped firmly onto the XML bandwagon this year, and they promptly squabbled about who invented which bits, and who should take the reins. The usual story.

That aside, all four corporations have made significant contributions to the XML process in one way or another. IBM has lead the way with its commitment to open source development, having donated several technologies to Apache, and made others available through its alphaWorks site. Sun has adopted XML as an extension to the Java platform and has produced some interesting innovations such as its XSLT translets demonstration. Oracle demonstrated the ease with which databases and XML can be integrated with its XSQL technology, as well as providing its own XML toolkit. Microsoft, having contributed to the development of SOAP, has adopted it as a core component in its future .NET platform, focusing on the provision and development of XML-based web services.

The Deviant was lucky enough to attend a vendor panel at XML Europe at which representatives of Microsoft, IBM, and Sun discussed their commitment to XML standards.

Two, No, Three Schema Specs

The wrangling over the W3C XML Schema specifications that took place on XML-DEV last year continued into February. The Deviant reported on the debate that covered the proliferation of alternative projects and the complexity of the specification and syntax; see "Spotlight on Schemas" for a round-up of comments.

The Schema Working Group responded by publishing a third specification, the XML Schema Primer. It provides an introduction to the features covered in the two main specifications. This was followed up in May with the announcement of an open source schema validator during the Last Call period.

XML-DEV returned to the topic of Schema complexity in July, prompted by discussions during the Last Call period. As the year has progressed additional tutorial material and tools have appeared, and XML-DEV has moved on to discussing the best practices associated with XML Schemas, rather than debating its shortcomings.

While support for W3C XML Schemas is definitely growing, it still isn't a one horse race (and is unlikely to ever be). Alternatives such as RELAX and Schematron are also gaining ground: RELAX has been submitted as an ISO standard, and Schematron now has its own SourceForge project and interest group.

Two Metadata Projects

Like XML Schemas, RDF is another specification that's come in from the cold -- one of two metadata projects that made good progress this year. The other is Topic Maps; although, to stretch a metaphor, Topic Maps didn't so much come in from the cold as banged on the windows and rattled the door demanding to be let in!

The RDF Interest group was particularly active this year, having debated the complexity of the RDF syntax and potential replacements (see "Instant RDF?"). It also gave some thought toward improving the description of the RDF data model, separate from its syntax (see "Super Model"). A whole array of RDF-related tools have appeared, including parsers, data converters and extractors, databases and query languages (see the xmlhack RDF category for pointers).

The prospects of a Semantic Web were given additional attention, with several attempts to describe its facilities, and lay out a road map for its development (see "Primed for the Semantic Web").

As well as presenting at every conceivable XML conference, the Topic Map community didn't let the grass grow under its feet, having recently produced their core deliverables. One promising aspect is the potential for convergence. Members of both activities have agreed to attend teleconferences to discuss this potential. The first of these has already taken place, and reports suggest that there is support on both sides for convergence.

One Underlying Syntax

XML is the foundation upon which all these efforts are constructed. It's fitting that we end a review of the year with XML itself. The Deviant has reported previously on concerns over the growing number of XML related standards with which developers must now grapple (see "XML Reduced"). The XML subset debate, which began a year ago and lead to the formation of SML-DEV, still crops up from time to time. Would an XML subset be useful or dangerous? The Deviant summarized the issues in "Profiling and Parsers".

XML is (part of) the foundation for this year's software engineering revolution, and XML-DEV isn't shy about giving it a kick once in a while to make sure it's still stable. The Deviant is looking forward to what the new year will bring. I hope you have a good holiday!