Sign In/My Account | View Cart  
advertisement


Listen Print

XSLT UK 2001 Report
by Jeni Tennison | Pages: 1, 2, 3

Meaning Definition Language

The advertised speaker for the next slot was Ben Robb of cScape Strategic Internet Services Ltd, but he was unable to speak due to work commitments. Instead, Robert Worden of Charteris spoke on the Meaning Definition Language (MDL) as a means of indicating the semantics of an XML vocabulary. The aim of MDL is to support meaning-level queries, automated XML transformation, a meaning-level API, and the Semantic Web in general. It works by associating particular nodes within an XML vocabulary to a meaning-level description of the domain, such as a UML class model (represented in XMIk), RDF or DAML plus OIL (which are XML vocabularies that represent ontologies, based on RDF).

Robert discussed the mappings between XML vocabularies and the underlying conceptual model that they represent. Generally, instances are represented as XML elements, although there can be a conditional aspect to the mapping, and not all instances in a particular domain will be represented within an XML document. Similarly, properties are usually represented as attributes, but it's important to identify the object that the attribute is a property of.

Associations between instances are represented in many different ways within XML vocabularies. They can be represented through ID/IDREF pairs, through element nesting, or through what Robert called overloading, where several instances are bundled together within another element that represents the association.

The MDL encodes these mappings and can be embedded in a schema, such that for each XML vocabulary of a particular domain, there is a mapping between it and a common conceptual model. MDL's benefits arise when users phrase queries in terms of the meaning of the information they wish to retrieve rather than having to know about a particular XML vocabulary. For transformations, it is possible to map between any two vocabularies via the conceptual model rather than having to design separate transformations for each pair of vocabularies.

XSLT as a Query Language

Next to take the stand was Evan Lenz of XYZFind Corp, who spoke about the XML Query work and the correspondence between it and XSLT. The XML Query work at W3C has produced a number of documents: requirements, use cases, a data model, an algebra, and a syntax known as XQuery.

Most of the requirements are very similar to those of XSLT. XML Query should be declarative, have closures, have an XML syntax, perform transformations, have a human readable syntax (which may or may not be the same as the XML syntax), operate on multiple documents, enable references between those documents, and use XML Schema information. Evan argued that it is only this last requirement that is not satisfied with XSLT as it currently stands.

Evan went on to describe some of the differences between XML Query and XSLT. The XML Query data model operates on the post-schema validation infoset (PSVI) and includes both ordered and unordered forests. The algebra is strongly typed, which enables static analysis, optimization and composable queries. XQuery uses XPath, as XSLT does, although it uses a restricted form that only allows the abbreviated syntax, rather than the full flexibility of axes. XQuery has similar constructs for most of the instructions in XSLT, but it doesn't have templates, instead having user-defined functions. Also, everything in XQuery is an expression.

The majority of the talk was spent going through some of the XML Query use cases, examining the XQuery solution and comparing it to the XSLT solution. In the main, these served to underline the question "What does XQuery do that XSLT doesn't do already?", and while Evan may have been preaching to the converted about the power of XSLT, it is interesting to highlight the features introduced by XQuery as these are areas that XSLT may address in its next incarnation:

  • RANGE operator in predicates to get nodes between certain positions
  • dereference operator (->), operating in a similar way to id() but with an implicit name test
  • DEFAULT namespace declarations, which specifies a namespace as the default namespace to be used to interpret unpredicated names in XPaths
  • functions such as avg() and distinct()
  • BEFORE and AFTER operators to get those nodes that occur before or after another node
  • SOME and EVERY operators that make explicit whether a comparison needs to be true for just one node in the node set, or for all of them
  • filter() function that copies all nodes aside from those in a second set

Evan raised some issues about whether XSLT could be used as a query language, in particular whether the use of full XPaths made optimization difficult, and the problem of the way that the XSLT built-in templates are set up to dump out the text of a document by default. But he rounded off by countering Steve Muench's assertion that "SQL + XML + XSLT = WOW" with the statement that just XML + XSLT = WOAH BABY!

Explaining XSL Formatting Objects

The final talk of the morning was given by Arven Sandström of e-plicity and the release coordinator of FOP, a formatting-object-to-PDF renderer. Arved talked through the XSL 1.0 Candidate Recommendation and the purpose of formatting objects. The anticipated use of formatting objects is as a final, unchangeable format, produced by an XSLT stylesheet and rendered mainly as PDF but also possibly as Postscript, PCL, MIF, or RTF.

Arved made the distinction between two different types of document: content-driven documents, such as books, and layout-driven documents such as newspapers. Formatting objects are currently limited to a single flow, with simple page masters, which means that certain things such as marginal notes are difficult (or ugly) using XSL-FO.

Using XSL-FO involves taking the result tree from an XSLT transformation, which comprises a number of elements and attributes in the XSL-FO namespace, and objectifying these into a formatting object tree. This tree is then refined to give the layout, which is in turn converted to an area tree which can be rendered. The XML elements have attributes, which are objectified into properties on the formatting objects, and finally traits on the area tree. These traits describe constraints on the layout, such as leeway on hyphenation or line or page breaking, which means that different renderers may render the same set of formatting objects in different ways.

Arved went through the formatting object basics. XSL-FO documents consist of an initial section which describes the page masters, followed by a number of page sequences. There is currently only one kind of page master, a simple page master, which has a central region surrounded by a header, a footer and left and right margins. Within these areas there are block and inline areas. XSL-FO has good support for lists and tables, supports references such as page numbers and citations, doesn't have support for tables of contents or indices, but does have markers. For electronic versions of documents, there are formatting objects that support links and dynamic display. Finally, XSL-FO supports floats and footnotes and allows you to incorporate external graphics or in-stream foreign objects such as snippets of SVG or MathML. Arved mentioned that he was using a left-to-right, top-to-bottom, Western view of documents when describing these terms, and that actually the XSL-FO Recommendation is a lot less Western-oriented.

There is a bit of a barrier to using XSL-FO, in that people need to have information in XML, be able to write XSLT stylesheets to convert it into XSL-FO, and need some expertise in page layouts in order to make best use of the formatting objects. In a web publishing environment, such as when using FOP with Cocoon, rendering large documents can take a long time. Arved talked about the possibility of piping streams of rendering information rather than using a batch process, delivering XML and XSLT to the client, so that the rendering can be carried out where there is relatively more free processing power, and having greater ties with XSLT processors so that information can be piped between them or at least passed as a DOM rather than through a file. He also pointed out that XSL-FO and CSS are very similar, with CSS being more oriented towards web publishing, while XSL-FO is suited to printing; and he speculated that implementations may be able to support both with a fair amount of reuse of code. For XSL 2.0, Arved anticipates general regions and more internationalization support.

The RenderX approach to XSL formatter design

Still on the topic of XSL-FO, David Tolpin from RenderX spoke next about the design of XEP, an XSL-FO renderer. David first talked through FORM, the parser that's used by XEP, which parses the XSL-FO, validates it, retrieves images, expands shortands, calculates properties, and adjusts the tree structure as required.

The kernel of XEP is FO2PDF, which takes the normalized XSL-FO tree and converts it into PDF. The FO tree is interpreted as a stream of events; certain objects, such as lists, tables, or outlying floats are managed through several separate streams, which are linked together at critical points. There are exceptions that take control away from this main stream, for things such as footnotes, page numbers, or keeps and breaks. Rendering constraints can clash with each other, which means that a renderer often needs to backtrack to attempt to satisfy them. XEP avoids backtracking as much as possible by constantly keeping track of the last point at which a page break was possible.

The result of the layout process is converted into a pragmatic internal vocabulary, which can be saved and then processed later by one of the output producers available with XEP. The XEP output producers include XML, Postscript, and three PDF producers.

David talked a little about the particular problems that the RenderX team had faced in developing XEP, such as managing nested tables and repeated table heads. He discussed their approach to footnotes, where the content of the footnotes is rendered backwards, to tell how much space they use up, and then reversed in the final page. As far as performance is concerned, David pointed out that parsing the XML is the slowest part, and that this could be alleviated by linking directly to the XSLT processor producing the XSL-FO or by reducing the default attribute set that is specified in XSL 1.0. Speed is a big issue for the RenderX team; the features included in XEP are assessed primarily according to the speed hit that they would incur, rather than their conformance to the specification.

Experiments with XSLT With Topic Maps

For the last talk of the conference, Ken Holman appeared again, this time to talk about the use of XSLT with Topic Maps. Ken set himself a number of goals; to render topic maps automatically, to navigate using them, to render different topics in different ways, and to merge topic maps -- all using XSLT.

Topic Maps express navigational meta-information about topics. Ken introduced the idea of Topic Maps by comparing them to glossaries, where each term might reference other terms, and thesauri, which contain synonyms and antonyms, as well as broader and narrower terms.

The results of Ken's experiments are a number of stylesheets based on version 0.2 of the XTM (XML Topic Maps) standard, which is now out of date. Ken constructed a navigation tool wherein different topics have different looks, inherited from further up the topic hierarchy. The associations between a topic and the stylesheet for that topic are represented within a Topic Map. The final set of HTML pages for the Topic Map is generated through a two step process; the first creates a set of stylesheets and a batch file that will generate the HTML when run.

Ken went through a number of the lessons that he learned while authoring these stylesheets. This was his first experience of using namespaces within stylesheets, and he was caught by the fact that XPaths are not interpreted using the default namespace. He also found that using xsl:copy rather than literal result elements is a good way to keep namespace declarations under control. As he was authoring extensible stylesheets, he recommended the use of namespaces for named templates. Ken demonstrated how to use the Allouche method to eliminate unwanted whitespace; and how to use xml:space within the stylesheet to preserve the indenting scheme he wanted rather than that used by the XSLT processor.

Two tips that I hadn't seen used before involved using terminating messages to prevent the stylesheets being used in inappropriate ways. In stylesheets that were designed to be imported rather than used as the main stylesheet, Ken included a template matching the root node which gave a message indicating that the stylesheet should not be used in that way (naturally this assumes that the importing stylesheet also has a template matching the root node). He used a similar technique to check that the stylesheet was being used on the correct type of document, having a stylesheet matching the (named) document element, and another, more general one, matching any document element and reporting an error. Ken also raised the possibility of using the content of xsl:sort and xsl:key to enable the sort value or key value to be calculated using XSLT rather than limiting it to XPath.

Conclusions

The XSLT UK '01 conference was a very enjoyable opportunity to get to know the people behind the names on XSL-List and to be brought up to date with some of the advances and developments in the fields of XSLT and XSL-FO. I'm sure all who attended are looking forward to the next XSLT UK conference, whether it's held in 6 months or in a year.

Many thanks are due Sebastian Rahtz and Dave Pawson for organizing it. The conference was sponsored by on-IDLE, who kept a modest low profile during the proceedings. 75% of the profits from the conference will be going to local charities in Oxford.