XML.com 
 Published on XML.com http://www.xml.com/pub/a/2002/12/18/xml2002.html
See this if you're having trouble printing code examples

 

Reports from XML 2002
By Eric van der Vlist
December 18, 2002

Eric van der Vlist is the author of XML Schema.

Microsoft Office Embraces XML

For many participants, the most memorable event of XML 2002 will be Jean Paoli's presentation of Office 11, which promises to deliver easier access to XML for hundreds of millions of workstations.

Those of us who had to connect a Windows PC to the Internet in the early 90s remember the difficulty of choosing and installing TCP/IP and the browser software necessary to access and browse the Web. At that time, Microsoft didn't believe in the future of the Internet, TCP/IP wasn't natively supported by Windows, Internet Explorer was a vague project in a laboratory, and many companies had developed market segments left uncovered by Microsoft.

This era was wiped away by a U-turn, which I have always considered a miracle from a company of the size of Microsoft. Within months, it bundled a TCP/IP stack and Internet Explorer with Windows. Internet features have been rapidly added to Office. Users applauded. Companies positioned on these segments either changed their strategy or disappeared, which is believed to have decreased the overall possibility for innovation in some domains.

The impression given in the presentation by Jean Paoli, co-editor of the XML 1.0 recommendation and pilot of the "XMLization" of Office, is that Microsoft is doing with XML what it did with the Internet.

XML was already well-supported by Microsoft BackOffice products from Biztalk to CMS through XML Server, but a key piece was missing: tools to let users manipulate XML on their workstations. The issue of editing XML documents is still as difficult as the connection to the Internet in the early 90s: technically there is a solution and many XML editors are available, but the financial, institutional, and human impact of deploying these tools on a large scale is considered a major obstacle by many organizations.

"XML for the masses" is the target of Office 11, and Paoli's presentation suggested that Microsoft will likely meet the target. Without major innovations (except maybe the linking of XML documents to Excel spreadsheets using XPointer references), Office 11 appears to xmlizing Office using largely standard technologies and enabling the manipulation of arbitrary XML documents using customer-chosen schemas:

Powerpoint is the only piece without an XML update because, according to Paoli, of the lack of time. For the rest of the applications, XML can be used as a channel of information between front and back office applications, but also between front office applications, creating a lot of new possibilities for users, consultants, and integrators who will be able to exploit them.

"XML for the masses", the target of Microsoft, is not the same as the target of the XML editors we know today, and it will probably take some time for Word to become a serious challenger. Nevertheless, many will be tempted to use a tool which is installed on so many workstations. The effect of this may well be to reduce the opportunity for competition, innovation, and value-add in the existing XML editor world, yet also to create new market segments.

Often criticized for its "embrace and extend" strategy, Microsoft has finally decided to continue to play the game with XML, even though the extensibility of XML opens new and unpredictable possibilities. But Microsoft needs to control all the major market segments in fear that XML might give to the masses the possibility of emancipation from its domination.

While the deployment of XML on millions of workstations is good news in the short term, it will certainly modify the landscape of desktop XML. The shape of this new landscape, and the longer-term consequences, are difficult to foresee.

OpenOffice: the XML format for the masses

Jean Paoli for Microsoft and Daniel Vogelheim for OpenOffice both chose the same title "XML for the masses" for their presentations, a commonality which hides two very different approaches from the editors of two competing office productivity suites.

The strategy of OpenOffice is to focus on the XML format natively used to store the documents. In his presentation Daniel Vogelheim gave an overview of the OpenOffice XML format and justified the design decisions taken to meet the following requirements:

The result is a complex format (over 450 elements and more than 1600 attributes), with a good amount of redundancy, but easily readable and easily transformable. To some extent it's an XML format for users more than for developers.

This format has been given as an input to the OpenOffice OASIS Technical Committee, which aims to create an "open, XML-based file format specification for office applications."

For Microsoft, the strategy appears to be to bring XML tools to each desktop, and leave each user free to choose appropriate schemas, rather than promoting an XML office format which will be specific to Microsoft, and for which a licensing model is still unclear.

If the target of Microsoft Office 11 is to deliver XML tools to the masses, the target of OpenOffice is to become the XML office format for the masses.

ISO DSDL on the move

Document Schema Definition Languages (DSDL) is a project of the ISO/IEC JTC 1/SC34/WG1 working group chaired by Charles Goldfarb. After the meetings held in Baltimore during XML 2002, this working group has published a set of recommendations which have been all approved:

DSDL has also advanced the definition of the remaining parts, and several of them should lead to publications during the next meeting at XML Europe 2003. Among the decisions taken on part 1 (Interoperability Framework), are as follows:

Further reading on DSDL and XVIF:

A burst of schemas

For different reasons many XML 2002 presentations proposed the use of multiple validations and transformations for advanced needs, rather than using a single schema considered too complex or even impossible to write and maintain.

Perhaps influenced by my editorship of "DSDL part 1 -- Interoperability Framework", I observed the justification of such a framework returning as a leitmotif in many presentations:

There are few commonalities between these presentations, but all of them show how, confronted with the issue of a complex validation in very different domains, projects have chosen to split the validation of their documents into different, easier to write, elementary steps.

That's also the approach taken by DSDL, the reason why it will be a multi-part standard, and the justification of its part 1, Interoperability Framework, which will define a language to describe the choreography of the elementary steps needed to perform a complex validation.

Impossible, Except For James Clark

The fact that the translation of Relax NG schemas into W3C XML Schema was considered impossible wasn't a good enough reason for James Clark, who presented at XML 2002 the latest progress of Trang, his multi-format schema converter.

While it is not possible to transform any Relax NG schema into W3C XML Schema, Clark considered that in most of the cases a converter smart enough to figure out the intent of the author should be able to write a readable W3C XML schema. This would be as good as a schema written directly using W3C XML Schema and a good approximation of the Relax NG original schema. Clark's presentation showed that this goal is not far from being met.

To achieve his aim, Clark had to make abstraction of the syntaxes beyond the two schema languages and to define a data model which is intermediate between Relax NG and W3C XML Schema. The translation is done in two steps: the Relax NG schema is parsed and feeds the data model, and criteria are applied to this data model to decide which W3C XML Schema should be used to get the best results both in term of approximation and readability.

The results from the examples presented by Clark is outstanding, and he has acknowledged that writing Trang was much more complex than writing a Relax NG implementation.

There appears to be quite a demand behind this tool too, as many developers are attracted by the simplicity of Relax NG, while they need to publish WXS schemas to be used by an increasing number of tools (Office 11 will require WXS schemas). Clark's answer to this request is simple: "don't bother with W3C XML Schema, write your schemas with Relax NG and use Trang to convert them."

Literate Programming in XML

Norm Walsh presented at XML 2002 his implementation ( XWEB) of Literate Programming in XML, available as part of his DocBook stylesheets, which enables the extraction of both documentation and code from common documents.

Introduced in 1984 by the father of TeX, Donald Knuth, Literate Programming is a software development methodology that focuses on documentation by embedding the source code in the documentation. Michael Sperberg-McQueen has proposed its application to SGML back in 1993.

Its principle is to document code fragments (classes, methods, variables or XSLT templates are examples of code fragments), to include the code fragments in their documentation, and to link them into a "web". This web can then be transformed to produce, on one hand, readable documentation in any format and, on the other, the source code.

The XML implementation presented by Walsh is thus based on mature concepts, and its current version was published in March 2002. It supports not only embedding traditional languages such as Perl or Java but also XML languages such as XSLT, schemas, or any other XML "code". Support for languages such as Python would probably require some adaptation since XWEB doesn't appear to care much about indentation.

Literate Programming isn't incompatible with other methodologies such as Extreme Programming; among the extensions which seem most interesting, one could probably easily add the definition of the unit tests within the documentation of the code fragments.

Purists will also note that, following tradition, XWEB is implemented through XWEB documents, a minimal XSLT transformation being provided to "bootstrap" these documents into usable pieces of code, which consist mostly of XSLT transformations.

XML.com Copyright © 1998-2006 O'Reilly Media, Inc.