XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Printing from XML: An Introduction to XSL-FO

October 09, 2002

Dave Pawson is the author of XSL-FO: Making XML Look Good in Print

One of the issues many users face when introduced to the production of print from XML is that of page layout. Without having the page layout right, its unlikely that much progress will be made. By way of introducing the W3C XSL Formatting Objects recommendation, I want to present a simplified approach that will enable a new user to gain a foothold with page layout.

The aim of this article is to produce that first page of output -- call it the "Hello World" program -- with enough information to allow a user to move on to more useful things. I'll introduce the most straightforward of page layouts for XSL-FO, using as few of the elements needed as I can to obtain reasonable output.

One of the problems is that, unlike the production of an HTML document from an XML source using XSLT, the processing of the children of the root elements is not a simple xsl:apply-templates from within a root element. Much more initial output is required in order to enable the formatter to generate the pages.

Let's look at the processing necessary to get from your XML document to a PDF printable document. First, the XML must be fed to an XSLT processor with an appropriate stylesheet (developed below) in order to produce another XML document which uses the XSL-FO namespace and is intended for an XSL-FO formatter. The second stage is to feed the output of the first stage to the XSL-FO formatter, which can then produce the end product: a printable document, styled for visual presentation.


XML      ->   XSLT          XSL-FO   ->    XSL-FO  printable 
document      engine        document     formatter document
               ^
               |
           XSLT stylesheet

This approach has the advantage that the XML source document is still format neutral and may be used with other XSLT stylesheets to produce other media.

The XSL-FO Document

We need to be aware of the initial target of the XSLT transformation, the XSL-FO document. The document you are producing, which is fed to the XSL-FO formatter, contains a small number of elements:


<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
 <fo:layout-master-set>                      [1]
  <fo:simple-page-master  
    master-name="simple" >                   [2]
   <fo:region-body/>
   </fo:simple-page-master>
 </fo:layout-master-set> 
 <fo:page-sequence 
               master-reference="simple">    [3]
   <fo:flow 
              flow-name="xsl-region-body">   [4]
        content                              [5]
   </fo:flow>
 </fo:page-sequence>
</fo:root>

Let's look at each of the identified elements in turn.

[1] In order to layout content on a page, the formatter needs to know what sizes it has to deal with. The layout-master-set contains the [2] simple-page-master which contains this information, e.g. whether you use a European A4 page size or an American US-letter size. It also contains the region-body element, which may be seen as the main body of the page layout.

[3] In order to support complex pagination, the page-sequence element is used. For a simple page layout, very little content is required here, other than to refer back to a particular page definition (the simple-page-master).

Also within the page-sequence element is a flow element [4]. The idea of a flow may or may not be familiar to you. I came across it using desktop publishing packages, where I poured text into page areas to build up columns for a college magazine, hence the content flowed into page areas.

Identifying which region of the page to pour the text into is the rationale for the xsl-region-body. This differentiates the body of the page from the outer areas (margins, header, footer etc.) of the page. Finally, some content [5], which is a child of the main flow. Simple text cannot be inserted here, since the formatter would have to guess what you wanted to do with it, so the real content for the flow would take the form of <fo:block>content</fo:block> which defines a block of text (rectangular in shape, big as you like, taking a full list of defaults for everything) which will be placed as the first item on the page.

In order to get a better grasp of all this, let's fill out, minimally, how it might fit into a stylesheet whose task is to take a simple XML document and produce another XML document, which is then fed to an XSL-FO formatter.

A basic XSLT stylesheet to produce XSL-FO is shown below.


<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   [1]
  xmlns:fo="http://www.w3.org/1999/XSL/Format">      [2]
  version="1.0">                 
  <xsl:output method="xml"/>                         [3]

  <xsl:template match="/">
     ....                                            [4]
  </xsl:template>

Other templates go here.                             [5]
</xsl:stylesheet>

In [1] and [2] we see the namespaces, respectively, of the XSLT and FO content in this document, which differentiates transformation requests from output content.

If the XSLT engine sees content in the FO namespace, it simply writes it to the output, which is exactly what we want. [3] says that we want the output document to be valid XML, which is just what an XSL-FO document is, an XML document. [4] is the root template, which fires first, hence this is the point at which we add the essential outline content mentioned above.

Finally, at [5], we can start to add useful processing. We can now combine the two snippets above to do something useful. What we have below is a complete XSLT stylesheet, which is used by the XSLT engine to produce a valid XSL-FO document.

Pages: 1, 2

Next Pagearrow