XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XML 2003 Conference Diary
by Eric van der Vlist | Pages: 1, 2

Semantic Interoperability

Next on my list was "Towards Semantic Interoperability of XML Vocabularies" by Jacek Ambroziak, which I was eager to see as a bridge between semantics and schemas. The goal of Ambroziak's proposal is to "exploit partial interoperability through declarative annotation". The basic principle is quite simple: for each vocabulary, sets of rules are described that make reference to a common ontology. These rules are triples "(concept, match, extract)" where "concept" is a reference to the ontology, "match" is a matching condition to identify the XML fragment and "extract" is a XSLT snippet to extract the value. Although Ambroziak says he isn't 100% sure that these triples are enough in all cases, he believes that the XSLT transformations needed to convert between two vocabularies can be generated from these rules, and he has already implemented a conceptual model of the mapping. One of the key points in his proposal is that it doesn't assume synonymity between vocabularies and that on the contrary, the concepts can be linked by relations such as "is_a", "instance_of" or "part_of".

Practical RDF: the Web Effect?

Thursday evening started with a "Practical RDF" town hall meeting, which could be seen as a continuation of the Topic Maps sessions of the afternoon in that it was mainly dedicated of showing practical applications of the semantic web. The main difference was that while the topic maps success stories of the afternoon were describing applications in a closed world (a book publisher or the IRS administration), the town hall meeting was describing applications of RDF in the open world of the World Wide Web.

It's been more than three years now that I've been following the development of the semantic web (yes, RSS 1.0 is already 3 years old) and I think that what has been presented in this town hall might be, at last, the beginning of a "web effect".

Beyond the efforts to develop the semantic web, there is the assumption that a simple enough technology that solves 80% of a huge problem will impose itself. This is what I call the "web effect": HTML and HTTP were in that position (simple technologies solving a 80% of a huge issue) when they gave birth to the WWW as we know it now. Everyone was conscious of their limitations, but nevertheless the technology was simple enough to be used by a large community. Tim Berners-Lee is desperately trying to play the same game again with the semantic web, and even though the technology is there (and has been there for most of the past three years), it has not yet been adopted by a significant community.

I find applications of RDF such as RSS 1.0 and more recently friend of a friend (FOAF) interesting because they remind me of the first pages published on the web: most of them were what we would now call "home pages" and most of these personal pages are about links (and that's basically what RSS is about) and personal descriptions (that's what FOAF is about). RSS and FOAF are thus the home pages of the semantic web. and if everything goes well their development could lead to the same effect than the development of home pages at the very beginning of the web.

This feeling is strengthened by the usage of the "rdf:about" attribute in FOAF. This attribute is to the Semantic Web what hyperlinks are to the web: it gives the ability to link any RDF documents to other RDF documents in which related information can be found. With this attribute and applications such as RSS and FOAF everyone can start writing his "personal semantic web pages" that not only link to classical web resources but also to other personal semantic web pages.

The Semantic Web might well take off when these two types of applications -- closed (or controlled) world applications and "semantic home pages" -- ultimately meet on the web. What I have seen at XML 2003 suggests that this could happen in a near term future.

Modelling and Programming

My first Friday morning session was "XER - Extensible Entity Relationship Modeling" by Arijit Sengupta. Sengupta proposed extending Entity Relationship (ER) modelling, the dominant methodology for modeling relational databases, to cope with the hierarchical structure of XML documents. He justifies the choice to use ER instead of UML because "there are many things in UML which are not needed for XML." The obvious downside of this choice is that while many tools let you edit UML class diagram and derive XML schemas from these diagram, XER is exploring new ground and requires new tools. A prototype, based on Visio, has been implemented as a proof of concept. The presentation was clear and did a good job of showing the issues involved in XML modelling (such as for instance the terminology clashes around "attributes" and "entities", which have different meanings in XML and ER) but hasn't convinced me that XER is superior to UML for modelling XML documents.

The other aspect of this presentation that I found disturbing is the assumption that XML documents must be described by W3C XML Schemas, and that the modelling technique should be equivalent to the schemas. Back in my RDBMS days, we used to differentiate logical and physical ER models and accept that these two levels couldn't be strictly equivalent but required some level of human action to be converted (otherwise, we wouldn't have needed two levels). I think that we should still have two different levels when we model XML documents, and that if the modelling is strictly equivalent to the schema, it's nothing more than a graphical representation of the schema. We still need a logical model for our XML documents, and this logical model should remain a logical description of the document rather than trying to describe schemas, which are the physical models. Another benefit of this approach is to be schema-agnostic and let you generate your schemas using any schema language.

The choice of the next session was horribly difficult with highly tempting talks by Rick Jelliffe and Uche Ogbuji, and I am very proud to have been able to decipher the title "Imperative Programming with Rectangles, Triangles, and Circles" as describing something worth attending. This talk, by Erik Meijer, was probably the most novel thing I saw at XML 2003.

Erik Meijer is a technical lead at Microsoft and presents himself as "an Haskell developer who has infiltrated the C# community." He introduced his Meijer's law -- "any sufficiently used API or common pattern should evolve into a language feature" -- before showing how hard it is to process XML documents with current APIs, and how his law could be applied to C#. His demonstration of the poor state of XML support in existing languages included both DOM level APIs (tedious to use) and binding tools (suffering a fundamental "impedance mismatch" between XML and objects). His proposal consists of introducing XML datatypes in C# (and eventually in other programming languages such as Java). These datatypes are strongly typed and their declaration uses a compact syntax for W3C XML Schema. The access to the individual nodes of these objects is done using a XPath like syntax where slashes ("/") have been replaced by dots (".") to separate the steps. The result was afterwards described by Erik Meijer as "similar to XQuery with a different data model" and I think that the analogy is fair if you consider that this gives you the same flexibility in creating and reading XML that you have with XQuery. Except, of course, that you are now using your favourite general purpose programming language.

All this isn't science fiction: Meijer has a prototype based on MS Visual Studio to show what he means. To justify the title of his presentation, he did a nice SVG demo where SVG rectangles, triangles and circles were manipulated directly as first class C# objects.

I wasn't the only one to be impressed in the room. Dare Obasanjo, in the front row, seemed to be in trance and James Clark said: "it's quite amazing, I am still speechless" before asking: "how does that work with schema languages?" The answer was that you can import WXS schemas as declarations; probably what Clark expected, if not would have wished. The only thing I don't like that much in this proposal is the fact that XML objects are strongly typed and that their schema must be declared. That's probably inherent to C# (or Java) and I'd like to see similar proposals in languages such as Python, PHP, Perl or Ruby. To me, using dynamically-typed languages with XML does really fit well with principle of loose coupling dear to XML. and these languages are most appropriate to process XML documents.

My next session was "Making W3C XML Schema's Object Oriented Features in SAX/XSL/DOM" from Matthew Fuchs, and had one of these titles I find scary. Also, I had seen previous presentations by Fuchs where most of the content had been far beyond my level of understanding, and I was wondering if that would still be the case. I was relieved to see that this presentation was more practical and clearer than I had feared. Its goal was to "see how we could get more OO with minor enhancements of these technologies" and Fuchs noted that even though the PSVI "contains oodles of reflective information about the schema", "polymorphic applications generally don't require significant reflection." This statement was a justification that it's enough to translate the PSVI by some minimal amount of annotation added to instance documents, which reminded me of old rants expressed on XML-DEV and even as a W3C TAG request (promptly rejected).

As proof of concept, Matthew Fuchs hacked the Apache parser Xerces-J to add these annotations, and they can then be accessed directly through usual tool sets. For instance, you can select all the elements derived from "foo:bar" using the XPath expression *[contains(@xsi:derivation, 'foo:bar')].

I like this idea because it's the least intrusive form that a PSVI can take. If we were able to use this principle, a PSVI could be generated through validation by a WXS schema, but also through a simple XSLT transformation, and probably though validation using other schema languages. I hope it will get more traction than my previous attempts to push it: I see it as a first step toward a schema-agnostic PSVI.

From DTDs to RELAX NG

As a choice for my last session, I had decided to follow Bob DuCharme's "Converting DTDs (and DTD Developers) to RELAX NG Schemas" (presentation). Debbie Lapeyre was in the room at a front row, maybe to convert her NLM DTDs? DuCharme described a very nice way of deriving and developing RELAX NG schemas from complex DTDs while keeping your DTDs alive all over the transition stage. To do so, he suggests a translation using James Clark's converter "trang". The next step is to keep the result of the translation as it is, not changing anything. You may ask what the benefit is of using a DTD translated as a RELAX NG schema, and that's where DuCharme's proposal is very clever: don't change what you've generated but write a RELAX NG schema that redefines, piece by piece, what is in the DTD to add constraints that can't be done with DTDs. During the transition phase, you can still update your DTD and regenerate new schemas, and when you have redefined all that needs to be redefined you can choose to switch to RELAX NG for good. DuCharme had also included a slide explaining why the same approach wasn't possible with W3C XML Schema.

In short, a nice presentation showing the benefit of the flexibility of RELAX NG. The only time I have was disappointed is when, answering a question related to DSDL, Bob DuCharme showed in his answer that he hadn't caught the key messages I had tried to pass on in my presentation. If even DuCharme hasn't caught them, I can't have been that clear.... One of the issues with DSDL is that our story, with its ten parts, is too long to explain and difficult to memorize. It will take some time before people get it right!

More on XML 2003

XML 2003 has been a big event, with what felt like several conferences in one conference, and this chronicle is only a personal overview of "my" XML 2003. Here are some links to learn more about XML 2003: