Menu

On Folly

December 8, 2004

Edd Dumbill

In this week's column, I'd like to indulge in some gentle fun at the expense of pundits and pronouncers. While XML is as rich a field as any for crackpots and timewasters, we must be careful not to pour cold water on experimentation and innovation. The topics of XML-oriented programming languages and the Semantic Web have been targets of mockery in their time, so this week I'm asking whether the true believers might be right.

XML-Aware Programming Languages

The adapation or creation of programming languages for the processing of XML has long been a theme of discussion in the XML world. From the SGML days, we have Omnimark. From XML itself, we have XSLT, XPath, and XQuery. In other mainstream languages, we see adaptations to aid XML processing, from XML-aware attributes in C# to tag-processing regular expressions in Perl. In addition to adaptions, we've seen a variety of experimental programming languages more strongly based on XML processing. (See how I cleverly used the word "experimental" to indicate that I, like any other sane person, don't take them seriously!)

Recent discussion on the XML-DEV mailing list focused on such matters, as Daniela Florescu and Mike Champion discussed some negative comments from Tim Bray on the topic of XML-oriented programming languages. The comments are quoted in Florescu's mail, but can also be found in Bray's weblog.

Bray's first objection to XML-oriented programming languages was, "If this hasn't happened after decades in the relational world, why would we expect it to happen in the XML world?" Champion put the complementary case that the main route explored in solving the relational impedance mismatch had been object-oriented database systems, largely considered as failures. He said that "one could at least make a plausible case that the opposite approach of making programming languages more RDBMS-friendly could have worked better."

I also suspect that widespread industrial use of 4GL languages along with database systems discounts the objection point too. On to the second objection from Bray, that there is in fact no one XML data model to wrap a language around. Setting aside the disproof-by-existence provided by XSLT, Champion pointed out that most XML data models have plenty in common.

Arguably the different flavors of the Infoset/DOM/XPath/XQuery data model have much more in common than not. Their differences tend to come down to:

  • how namespaces are represented (as declaration nodes a la the XML syntax or as element/attribute nodes that "just know" their namespace.
  • how "syntax sugar" is represented (or not); the mismatch between the DOM and XPath data models (e.g. CDATA sections are represented in DOM but not XPath) causes immense pain for implementers who try to support both and monumental frustration for users who wonder what crack we were all smoking when we dreamed this up.
  • how datatypes are represented or not represented.

Florescu is also unconvinced by Bray's objections and said that she didn't understand why Bray thought that XML-oriented programming languages were silly. Champion admitted that Bray may have a point in the short term: "we just have to get used to the fact that programming languages, database systems, and data interchange formats are three different things and learn to work with all three comfortably." In the long term, however, some of today's silly ideas may prove to have a more profound effect upon the programming languages we use.

Champion cited two developments of particular interest. The first is E4X, the addition of native XML capabilities to ECMAScript. An implementation of this in the Mozilla project is currently coming to fruition. The second development is "Comega" (aka "Cw"), an extension of C# including native XML data types. (Editor's Note: Watch XML.com for a forthcoming introduction to Comega from Dare Obasanjo.)

News from the Semantic Web

Although many are still inclined to shy away from the Semantic Web project, there's a marked increase in serious debate and coverage. A few recent happenings have excited my interest.

The unthinkable rapprochement between topic maps and RDF has occurred, signified by the formation of the W3C RDF/Topic Maps Interoperability Task Force. The task force is part of the Semantic Web Best Practices and Deployment Working Group. The last time I was with the majority of the people listed as members of the task force, it was in a very pleasant drinking establishment in Amsterdam. It's nice to think that the bonhomie of that evening has persisted into forming the basis of the task force.

Gandhi's dictum that "first they ignore you, then they laugh at you, then they fight you, then you win" never seemed so appropriate for the builders of the Semantic Web. Past the ignoring and laughing stage, we're seeing serious engagement on the issues of the Semantic Web.

In an Infoworld article, Jon Udell takes the party line that Tim Berners-Lee is the pursuer of an unattainable vision, and I think he mischaracterizes Berners-Lee's approach as calling for globally agreed ontologies. Nevertheless, he does note an increase in independent software activity concerning ontologies and the Semantic Web.

Udell also puts forward an interesting theory that one motivating factor for people and organizations to publish machine readable data about themselves on the web might be that if they don't, somebody else will. Better to have the facts from the horse's mouth. While I'm not entirely sure about that as a factor to bootstrap the Semantic Web, I do know that guerilla attempts at metadata-scraping and enhancement can have a positive effect on publishers.

Clay "Dark Matter of the Internet" Shirky is taking a rather more aggressive approach. In a talk scheduled for next year's O'Reilly Emerging Technology conference, he wants to rescue semantics from the Semantic Web. Ontology is near-dead, apparently: "a 300-year-old hack, now nearing the end of its useful life." Shirky says that the Semantic Web project is predicated upon the ready replication of ontological successes such as the Library of Congress' classification scheme.

Both Shirky and Udell seem to be pretty much convinced the Semantic Web requires, from the outset, globally agreed ontologies. It seems more that they've set up a straw man. I had always envisaged that in the same way user interface and other conventions have emerged from the messy web, so would ontological conventions. Messy, but good enough.

Births, Deaths and Marriages

The latest announcements from the XML-DEV mailing list.

Call for Participation, XTech 2005, 24-27 May, Amsterdam, Netherlands

Submissions of abstracts for papers and tutorials are now being accepted for XTech 2005. The deadline for proposals is January 7, 2005. XTech 2005 is a conference for developers and managers working with XML and web technologies, bringing together the worlds of XML and web development, open standards, open source, and open data.

Your humble correspondent happens to be program chair for the XTech conference and wrote recently about the new tracks being introduced this year.

Nemo--Glues RDF to XSLT

A library for accessing RDF datastores with XSLT processors. A library for accessing standalone and embedded RDF/XML models. Currently consists of simple adapters between the Saxon XSLT and XQuery Processor and Jena Semantic Web Framework.

Scrapings

300 messages to XML-DEV last week, Len rating 3.7% (resting) ... ceci n'est pas une pipe ... web services are the new punk culture ... more important than mailing lists: cat photos ... life is messy, so deal with it ... more important than DocBook: cat photos ... for many vendors, XML is the acceptable face of LISP.