XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


What Is XSL-FO
by G. Ken Holman | Pages: 1, 2, 3, 4, 5, 6, 7

1.1.6  Extensible Stylesheet Language (XSL/XSLFO)

XSL (or XSLFO) describes formatting and flow semantics for paginated presentation that can be expressed using an XML vocabulary of elements and attributes:

Paginated formatting and flow semantics vocabulary

This hierarchical vocabulary captures formatting semantics for rendering textual and graphic information in different media in a paginated form. A rendering agent is responsible for interpreting an instance of the vocabulary for a given medium to reify a final result.

This is no different in concept and architecture than using HTML and Cascading Stylesheets (CSS) as a hierarchical vocabulary and formatting properties for rendering a set of information in a web browser. Such user agents are not pagination-oriented and effectively have an infinite page length and variable page width.

Indeed, the printed paged output from a browser of an HTML page is often less than satisfactory. Paginated information includes navigation tools such as page numbers, page number citations, headers, footers, etc. to give the reader methods of finding information or finding their location in a printed document.

In essence, when doing any kind of presentation, we are transforming our XML documents into a final display form by transforming instances of our XML vocabularies into instances of a particular rendering vocabulary that expresses the formatting semantics of our desired result. Our choice of vocabulary must be able to express the nature of the formatting we want accomplished. We can choose to transform our information into a combination of HTML and CSS for web browsers and can choose an alternate transformation of XSLFO for paginated display (be that paginated to a screen, to paper, or perhaps even aurally using sound).

In this way XSLFO can be considered a pagination markup language.

Target of transformation

When using the XSLFO vocabulary as the rendering language, the objective for a stylesheet writer is to convert an XML instance of some arbitrary XML vocabulary into an instance of the formatting semantics vocabulary. This formatting instance is the information rearranged into an expression of the intent of the paginated result as a collection of layout constructs populated with the content to be laid out on the rendered pages.

This result of transformation cannot contain any user-defined vocabulary constructs (e.g.: an "address", "customer identifier", or "purchase order number" construct) because the rendering agent would not know what to do with constructs labeled with these foreign, unknown identifiers.

Consider again the two examples: HTML for rendering on a single page infinite length in a web browser window, and XSLFO for rendering on multiple separated pages on a screen, on paper or audibly. In both cases, the rendering agents only understand the vocabulary expressing their respective formatting semantics and wouldn't know what to do with alien element types defined by the user.

Just as with HTML, a stylesheet writer utilizing XSLFO for pagination must transform each and every user construct into a rendering construct to direct the rendering agent to produce the desired result. By learning and understanding the semantics behind the constructs of XSLFO, the stylesheet writer can create an instance of the formatting vocabulary expressing the desired layout of the final result (e.g. area geometry, spacing, font metrics, etc.), with each piece of information in the result coming from either the source data or the stylesheet itself.

Consider once more the customer information in Example 1-1. An XSLFO rendering agent doesn't know how to render a marked up construct named <customer>. The XSLFO vocabulary used to render the customer information could be as follows:

01  <fo:block space-before.optimum="20pt" font-size="20pt">From:
02  <fo:inline font-style="italic">(Customer Reference)
03  <fo:inline font-weight="bold">cust123</fo:inline>
04  </fo:inline>

05  </fo:block>
Example 1-7: XSLFO rendering semantics markup for example

The rendering result when using the Portable Document Format (PDF) would then be as in Figure 1.2, with an intermediate PDF generation step interpreting the XSLFO markup for italics and boldface presentation semantics.

XSLFO rendering for example
Figure 1.2: XSLFO rendering for example

The figure again illustrates the two distinctive styling steps: transforming the instance of the XML vocabulary into a new instance according to a vocabulary of rendering semantics; and formatting the instance of the rendering vocabulary in the user agent.

The formatting semantics of the XSLFO vocabulary are described for both visual and aural targets, so we can use one set of constructs regardless of the rendering medium. It is the rendering agent's responsibility to interpret these constructs accordingly. In this way, the XSLFO semantics can be interpreted for print, display, audio, or other presentations. There are, indeed, some specialized semantics we can use to influence rendering on particular media, though these are just icing on the cake. Dynamic behaviors can be specified for a highly interactive electronic display that would not function at all, obviously, in the paper form.

1.1.7  Transforming and rendering XML information using XSLFO

When the result tree in an XSLT process is specified to utilize the XSLFO pagination vocabulary, the normative behavior of an XSLFO processor incorporating an XSLT processor is to interpret the result tree. This interpretation reifies the semantics expressed in the constructs of the result tree to some medium, for example pixels on a screen, dots on paper, sound through a synthesis device (see Figure 1.3).

Transformation from XML to XSL Formatting Semantics
Figure 1.3: Transformation from XML to XSL Formatting Semantics

The stylesheets used in this scenario contain the transformation vocabulary and any custom extensions, as well as the desired result XSLFO formatting vocabulary and any foreign object vocabularies. There are no other element types from our XML vocabularies are in the result. If there were, rendering processors would not inherently know what to do with an element of type custnbr representing a customer number; it is the stylesheet writer's responsibility to transform the information into information recognized by the rendering agent.

There is no obligation for the rendering processor to serialize the result tree created during transformation. The feature of serializing the result tree to XML markup is, however, quite useful as a diagnostic tool, revealing to us what we really asked to be rendered instead of what we thought we were asking to be rendered when we saw incorrect results. There may also be performance considerations of taking the reified result tree in XML markup and rendering it in other media without incurring the overhead of performing the transformation repeatedly.

1.1.8  Interpreting XSLFO instances directly 

The XSLFO and foreign object vocabularies can also be used in a standalone XML instance, perhaps as the result of an XSLT transformation using an outboard XSLT processor. The XSLT processor serializes a physical entity from the transformation result tree, and that XML file of XSLFO vocabulary being interpreted by a standalone XSLFO processor.

Creating standalone XML instances of XSL vocabulary
Figure 1.4: Creating standalone XML instances of XSL vocabulary

This diagram delineates three distinct phases of the process that are also phases when the XSLT and XSLFO processors are combined into a single application. The transformation phase creates the XSLFO expressing our intent for formatting the source XML. The XSLFO processor first interprets our intent into the information that is to be rendered on the device, then effects the rendering to reify the result.

1.1.9  Generating FO instances 

XSLFO need not be generated by XSLT in order to be useful. Consider that when we learned HTML as the rendering vocabulary for a web user agent, we either coded it by hand or we wrote applications that generated the HTML from our information. This information may have come from some source, such as a database.

Learning XSLT, we can express our information in XML and then either transform the XML into HTML to send to the user agent, or send the XML directly to an XSLT process in the user agent.

The typical generation of XSLFO would be from our XML using an XSLT stylesheet, though this need not be the case at all. We may have situations where our applications need to express information in a paginated form, and these applications could generate instances of the XSLFO vocabulary directly to be interpreted for the output medium.

Generating XML instances of XSL vocabulary
Figure 1.5: Generating XML instances of XSL vocabulary

We need to remember that XSLFO is just another vocabulary, able to be expressed as an XML instance, requiring an application to interpret our intent for formatting in order to effect the result. This is no different than the use of the HTML vocabulary for a web browser.

The sole requirement is that the namespace of the vocabulary in the instance be "http://www.w3.org/1999/XSL/Format" for the labeled information in the instance to be recognized as expressing the semantics described by the XSLFO Recommendation.

Note 3:

The default namespace may be used for the XSLFO vocabulary, just as is true with any vocabulary. Personally, I don't use the popular "fo:" prefix in my stylesheets, as it is my habit to use the default namespace and not prefix my XSLFO names in any way.

This practice reinforces for me that this is just as simple as HTML, where I don't use any namespace at all in my own stylesheets.

There are processors that interpret standalone XSLFO instances interactively on the screen in a GUI environment. To learn much of the nuances of XSLFO, I often hand-author XSLFO instances experimenting with various objects and properties in elements and attributes, tweaking values repeatedly and examining the results interactively with the formatting tool. Having hand-authored HTML, using the default namespace for XSLFO is very natural and saves on the amount of typing as well.

This is a prose version of an excerpt from an edited version of the book "Practical Formatting Using XSLFO" (First Edition ISBN 1-894049-07-1 at the time of this writing) published by Crane Softwrights Ltd., written by G. Ken Holman.
Copyright © Crane Softwrights Ltd.

Pages: 1, 2, 3, 4, 5, 6, 7

Next Pagearrow