XML-related Activities at the W3C

January 3, 2001

C.M. Sperberg-McQueen


Anyone contemplating the intensity of XML-related work in the World Wide Web Consortium (W3C) nowadays might be forgiven for finding it hard to believe that XML 1.0 became a W3C Recommendation less than two years ago. W3C Notes, Submissions from Member organizations, and draft specifications in varying stages of completion define applications of XML, rules for using XML in particular contexts, extensions to XML, languages for processing XML documents, languages for declaring XML-based languages, languages for querying collections of XML documents -- all in ever-greater speed and profusion.

This article provides a brief survey of recent work and current efforts on XML-related topics at W3C, as a sort of abbreviated annual report to the XML community of what's going on.

The original goal of the XML Activity in W3C was, from one point of view, merely to add a little more structure to documents on the Web and provide a little more flexibility than HTML, or any single tag set, could provide. From another point of view, the goal was to make SGML (the Standard Generalized Markup Language defined by ISO) usable on the Web by making a lighter-weight version of SGML (defined in 25 pages or less). This lightweight SGML would be easier for software developers to implement and to embed in software whose main task was something other than being an SGML processor. To achieve these goals, the XML Working Group planned to define three languages: XML itself, a linking language called XLink, and a stylesheet language called XSL, roughly analogous to the ISO standards SGML, HyTime, and DSSSL.

The most important thing that has happened is that with the wide adoption of XML the set of goals has expanded. XML is used not only for natural-language documents, but for all kinds of information. Database owners (and vendors) want to be able to expose arbitrary relational and object-relational databases on the Web in XML form. Application developers want to use XML for all kinds of information interchange. Both would like better methods of defining specific XML languages for use in applications. For all sorts of reasons it would be useful to run queries against collections of XML-encoded information. If XML is going to be used in electronic commerce or similar applications, digital signatures are essential; to make the signatures more robust, some explicit method of transforming an XML document into a canonical form is desirable. And so on. Every new application of XML leads to new requirements for standardization.

So it's not surprising that XML-related work in the W3C now occupies not merely one Working Group but several. It's even hard to count the W3C Working Groups involved in XML. Any W3C specification that defines a data format or tag set is likely, nowadays, to be using XML. The most visible example to users is the reformulation of HTML as an application of XML, which began with the release of the XHTML 1.0 specification as a W3C Recommendation in January 2000. That reformulation continues with work on a more modular formulation of (X)HTML, parts of which are now complete (XHTML Basic is now a Recommendation) or nearing completion (a spec on Modularization of XHTML is now a Candidate Recommendation).

Other W3C specs that apply XML to specific application areas include RDF (the Resource Description Framework), SMIL (the Synchronized Multimedia Integration Language), SVG (Scalable Vector Graphics), and P3P (the Platform for Privacy Preferences).

Working Group Reports

W3C Process

A quick word about W3C specifications: they begin as Working Drafts, and then are published for public review as Last Call Working Drafts (think of the "last call" shortly before the pub closes). When last-call issues have been resolved, the spec is published as a Candidate Recommendation, and software developers are invited to implement it. After implementation experience has been gathered, the spec becomes a Proposed Recommendation, and, if all goes well, after a period of review by the membership, a W3C Recommendation. At any point, specs may go back to an earlier stage of the process for further modification and review.

Further information:
W3C Process
W3C and the Web Community

The basic framework within which XML applications can be built is the responsibility of several Working Groups, most but not all of them in the W3C's XML Activity (the W3C organizes technical work into Activities, each of which may involve one or more Working Groups developing specifications). The XML Activity is organizationally the heir of the original single XML Working Group; it now comprises the XML Core, XML Linking, XML Schema, and XML Query Working Groups. Other Working Groups crucially involved with XML are the DOM (Document Object Model), the XSL (Extensible Stylesheet Language), and the XML Protocols Working Groups.

The XML Core Working Group published a Second Edition of the XML 1.0 spec this year, edited by Eve Maler of Sun Microsystems, and every reader will be grateful for its clarifications and corrections. The Core WG also released working drafts of XML Inclusions (XInclude) and the XML information set ("infoset"). The first of these provides a way of using XML element markup to embed objects or portions of a document into a larger context. The second fills a gap left in the original XML 1.0 specification by its failure to specify exactly what information an XML processor is responsible for passing to a downstream application. The XML Information Set specification provides a concrete inventory of so-called information items and their properties in an XML document, thus creating a somewhat cleaner formal description of what counts as information in an XML document and what doesn't. A possible future revision of the XML Namespaces Recommendation and a document proposing a classification scheme for describing XML processors are currently on a Core Working Group back burner.

The XML Linking Working Group recently achieved a hat trick by releasing all three of their specifications as Candidate Recommendations at the same time (and two of them, XLink and XML Base, moved to Proposed Recommendation status on 22 December 2000). The XML Linking Language (XLink) was one of the deliverables of the original XML Working Group; it defines standard ways of linking among resources which go well beyond the simple in-place two-ended unidirectional links familiar from HTML. It allows the expression of more complex links with arbitrary numbers of link ends and arbitrary locations. These facilities have been part of working hypertext systems since the 1960s; now they can be part of the Web. The XML Base generalizes the HTML BASE facility to make it possible to specify a base URI for interpreting relative URIs in a language. The XML Pointer language (XPointer) defines a powerful notation for use in linking to XML documents; it's an extension of the XPath notation familiar from XSLT.

The XML Schema Working Group is currently dealing with comments received on the Candidate Recommendation draft of the XML Schema language. XML Schema is a metalanguage for defining XML tag sets and applications. It provides functionality similar to that of XML 1.0 document type definitions (DTDs). It adds the ability to assign datatypes (e.g. integer, calendar date) to elements and attributes, explicit support for namespaces, and more powerful content models than DTDs. XML Schema also uses XML, rather than an ad hoc notation, for declarations; this means XML Schema documents, unlike DTDs, can be processed by normal XML software instead of requiring ad hoc tools.

The XML Query Working Group is nearing the end of its systematic assault on the problem of defining a query language for XML documents. Having published a requirements document, specified a formal data model, and then formulated a query algebra on top of the data model, the Working Group will next turn its attention to syntax design for the actual query language. The requirements and data model have been out for some time; the query algebra was published at the end of 2000. Public comment is invited.

The XSL Working Group is responsible for defining a stylesheet language for XML documents. The first major part of their work became available as a Recommendation about a year ago in the form of the XSL Transformations (XSLT) language, which has rapidly become the preferred means of transforming XML documents. The second part, specifying a library of XSL formatting objects, was published as a Candidate Recommendation in November; the comment period runs through February 2001. XSL formatting objects are compatible with those of Cascading Style Sheets but provide a richer set of typographic semantics.

The Document Object Model (DOM) Working Group also passed a major milestone this year, issuing five Recommendations which together define Level 2 of the DOM for XML documents. The level-two Core defines a set of interfaces for creating and manipulating the contents of a document; level-two Views allow software to dynamically manipulate the representation of the document. Other specifications define an event system, access to stylesheets, and ways to define and traverse ranges of content in a document.

The newest XML-related Working Group in the W3C is the XML Protocols Working Group, chartered to create a simple foundation for program-to-program communication using XML. They are currently engaged in developing a requirements document and in surveying the existing work in the field: SOAP, XML-RPC, WDDX, XMI, Jabber, ebXML, and others.

This quick survey of XML-related work at the World Wide Web Consortium has scarcely done more than list the various kinds of work going on. For more information, as always, consult the W3C Web site, the W3C XML home page and in particular the W3C Technical Reports page. W3C specifications are published early and often for public review. By participating actively and commenting on drafts, you can have an influence on the future of XML, and help W3C lead the Web to its full potential.