Bleeding-Edge XML: XLink and Apache

February 28, 2000

Edd Dumbill

Tutorial sessions at XTech 2000 are a great place to find out about the bleeding edge of XML development. To give you a taste of the current work on XML specifications and technologies, we attended two sessions: XLink and Apache: XML Publication Techniques.

XLink represents the cutting edge of W3C specification development, while Apache XML represents the cutting edge of XML web publishing tools. Both sessions demonstrated incredible potential for the future application of XML.


Eve Maler, co-editor of the XLink spec, presented a tutorial on the XML linking technology, covering XLink and XPointer. She was able to report on changes and decisions made just last week by the W3C working group. XLink has just entered a period of being a "last call" working draft, so the next published spec will be stable enough to base implementations on.

So what is the motivation for creating an XML linking language? Why won't HTML links suffice? Maler presented four reasons why XLink is required:

  1. Hard-coded markup and behavior: In HTML, only certain elements (the A and IMG elements and a few others) can have linking behavior. If you want to create a link, it has to be a refinement of these elements, which is unnecessarily restrictive.

  2. Links are inseparable from the document: In HTML, you can't add links to a document you don't own. This limits the capability for facilities such as annotation.

  3. Link target offers little granularity: Commonly the target of an HTML link is an entire document. You may be able to link to an anchor defined in a document, but only if the author has actually added it. There is no facility for linking to an arbitrary part of a document.

  4. Links are one way: HTML only supports linking outbound from your document; there is no way you can create links inbound to your document from external documents.

These deficiencies of HTML links provided the basic motivation for XLink. Originally, XML linking was part of the XML 1.0 activity, but got separated off. Since then it has become a three-part activity in its own right, comprising XLink, XPointer, and XML Base. While XLink is a vocabulary for expressing links, XPointer is an extension of the XPath language (found in XSLT) that allows you to pinpoint a remote resource. XML Base offers facilities in XML similar to those of the HTML element BASE.

XLink achieves its ends by specifying attributes that can turn any element into a link. That is, you are free to use any element in your XML DTD as a link by adding the XLink attributes to it. While this may seem confusing to those used to HTML links, it allows maximum flexibility.

Two kinds of link are possible: simple and extended. The simple link offers basically the same kind of functionality available with the HTML A and IMG tags. Here's an example of a simple link:

  <myLink xmlns:xlink=""





  >click here for the next file</myLink>

This link would, when activated in the user agent, replace the current document with the myTarget.xml document. Extended links are more syntactically complex, but offer the ability for links with multiple endpoints. It is the XPointer specification that allows the endpoint of a link to be at a finer granularity than document-level.

Participants in the XLink tutorial were clearly very excited about the functionality offered, and Maler did a good job of presenting a lucid picture of the XLink specification. While tool support is currently limited, it is expected that the stabilization of the specification will lead to many implementations. In particular, it is possible to experiment with (and implement a reasonable amount of) XLink interpretation with an XSLT style sheet.

For more information on XLink, see:

Apache: XML Publishing Techniques

An energetic Pierpaolo Fumagalli, a developer with the Apache Cocoon project, gave an interesting overview of the challenges that arise when developing web sites with XML. With two years' experience developing and documenting the Cocoon server (among other projects), Fumagalli has developed his own unique methods and techniques.

Fumagalli shared some of the particular problems encountered with Cocoon 1. Cocoon is a server for parsing, transforming, and styling XML. Typically the output files are HTML, although PDF and graphic formats can also be produced. Many of the lessons learnt by Fumigalli and colleagues have applicability over the spectrum of XML processing and publication applications.

The first version of Cocoon had two major difficulties that limited its usefulness. Foremost among these was its use of the DOM (a parsed tree model in memory) to pass processed XML data from stage to stage. This led to memory bloat and resultant inefficiency. The second problem was inflexibility due to a one-to-one association between the source XML and its transformation process. For example, this meant it was impossible to render the same XML file as both HTML and PDF.

The DOM problem is being countered by the use of two alternative techniques. The first of these is the conversion to using SAX, a processing model that allows streaming of events inside the server, rather than waiting for the parsing of an XML document to complete. Unfortunately SAX raised problems for Cocoon where XSLT transformations were required. So, in addition, they are currently developing the use of a special DOM that can be read from as it is being built. Fumigalli demonstrated that careful construction of style sheets can lead to better performance from the server with this method.

The second problem with the first version of Cocoon was solved by the introduction of a mapping file, which instructs Cocoon how to process and style the source XML in order to produce the target HTML. The technology to do this is called "Stylebook," and forms part of the Apache XML project.

Of perhaps more immediate use to those trying to build web sites with XML today were Fumigalli's experiences with creating the Apache XML web site. He has developed techniques to allow maximum flexibility in every stage: from DTD design through to styling. By the application of two XSLT transformations he is able to isolate DTD changes from style sheets changes.

Fumigalli has invented an intermediate DTD, which he calls "Graphic Metalanguage," to sit between the original DTD and the output format (HTML or PDF). One XSLT sheet is applied to transform the source XML into the metalanguage, and another is then applied to transform it into the target format. This means that if the original DTD changes—not an infrequent occurrence in rapid development cycles—only the transform to the metalanguage needs to be altered, rather than the style sheet for every desired output format.

Pierpaolo Fumigalli's presentation was full of the energy that characterizes the Apache XML developers, and it was fascinating to hear the account of problems encountered and how they were solved.

For more information, see: