Menu

Printing from XML: An Introduction to XSL-FO

October 9, 2002

Dave Pawson

Dave Pawson is the author of XSL-FO: Making XML Look Good in Print

One of the issues many users face when introduced to the production of print from XML is that of page layout. Without having the page layout right, its unlikely that much progress will be made. By way of introducing the W3C XSL Formatting Objects recommendation, I want to present a simplified approach that will enable a new user to gain a foothold with page layout.

The aim of this article is to produce that first page of output -- call it the "Hello World" program -- with enough information to allow a user to move on to more useful things. I'll introduce the most straightforward of page layouts for XSL-FO, using as few of the elements needed as I can to obtain reasonable output.

One of the problems is that, unlike the production of an HTML document from an XML source using XSLT, the processing of the children of the root elements is not a simple xsl:apply-templates from within a root element. Much more initial output is required in order to enable the formatter to generate the pages.

Let's look at the processing necessary to get from your XML document to a PDF printable document. First, the XML must be fed to an XSLT processor with an appropriate stylesheet (developed below) in order to produce another XML document which uses the XSL-FO namespace and is intended for an XSL-FO formatter. The second stage is to feed the output of the first stage to the XSL-FO formatter, which can then produce the end product: a printable document, styled for visual presentation.

XML      ->   XSLT          XSL-FO   ->    XSL-FO  printable 
document      engine        document     formatter document
               ^
               |
           XSLT stylesheet

This approach has the advantage that the XML source document is still format neutral and may be used with other XSLT stylesheets to produce other media.

The XSL-FO Document

We need to be aware of the initial target of the XSLT transformation, the XSL-FO document. The document you are producing, which is fed to the XSL-FO formatter, contains a small number of elements:

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
 <fo:layout-master-set>                      [1]
  <fo:simple-page-master  
    master-name="simple" >                   [2]
   <fo:region-body/>
   </fo:simple-page-master>
 </fo:layout-master-set> 
 <fo:page-sequence 
               master-reference="simple">    [3]
   <fo:flow 
              flow-name="xsl-region-body">   [4]
        content                              [5]
   </fo:flow>
 </fo:page-sequence>
</fo:root>

Let's look at each of the identified elements in turn.

[1] In order to layout content on a page, the formatter needs to know what sizes it has to deal with. The layout-master-set contains the [2] simple-page-master which contains this information, e.g. whether you use a European A4 page size or an American US-letter size. It also contains the region-body element, which may be seen as the main body of the page layout.

[3] In order to support complex pagination, the page-sequence element is used. For a simple page layout, very little content is required here, other than to refer back to a particular page definition (the simple-page-master).

Also within the page-sequence element is a flow element [4]. The idea of a flow may or may not be familiar to you. I came across it using desktop publishing packages, where I poured text into page areas to build up columns for a college magazine, hence the content flowed into page areas.

Identifying which region of the page to pour the text into is the rationale for the xsl-region-body. This differentiates the body of the page from the outer areas (margins, header, footer etc.) of the page. Finally, some content [5], which is a child of the main flow. Simple text cannot be inserted here, since the formatter would have to guess what you wanted to do with it, so the real content for the flow would take the form of <fo:block>content</fo:block> which defines a block of text (rectangular in shape, big as you like, taking a full list of defaults for everything) which will be placed as the first item on the page.

In order to get a better grasp of all this, let's fill out, minimally, how it might fit into a stylesheet whose task is to take a simple XML document and produce another XML document, which is then fed to an XSL-FO formatter.

A basic XSLT stylesheet to produce XSL-FO is shown below.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   [1]
  xmlns:fo="http://www.w3.org/1999/XSL/Format">      [2]
  version="1.0">                 
  <xsl:output method="xml"/>                         [3]

  <xsl:template match="/">
     ....                                            [4]
  </xsl:template>

Other templates go here.                             [5]
</xsl:stylesheet>

In [1] and [2] we see the namespaces, respectively, of the XSLT and FO content in this document, which differentiates transformation requests from output content.

If the XSLT engine sees content in the FO namespace, it simply writes it to the output, which is exactly what we want. [3] says that we want the output document to be valid XML, which is just what an XSL-FO document is, an XML document. [4] is the root template, which fires first, hence this is the point at which we add the essential outline content mentioned above.

Finally, at [5], we can start to add useful processing. We can now combine the two snippets above to do something useful. What we have below is a complete XSLT stylesheet, which is used by the XSLT engine to produce a valid XSL-FO document.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fo="http://www.w3.org/1999/XSL/Format">
  version="1.0">
  <xsl:output method="xml"/>

  <xsl:template match="/">
   <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>                  
      <fo:simple-page-master  
       master-name="simple"                 
                  page-height  ="29.7cm"       [1]
                  page-width   ="21cm"
                  margin-left  ="2.5cm"
		  margin-right ="2.5cm">
       <fo:region-body margin-top="3cm"/>      [2]
       </fo:simple-page-master>
 </fo:layout-master-set> 
 <fo:page-sequence 
               master-reference="simple">      [3]
   <fo:flow 
              flow-name="xsl-region-body">     [4]
        <xsl:apply-templates/>                 [5]
   </fo:flow>
 </fo:page-sequence>
</fo:root>
  </xsl:template>

  <xsl:template match="document">              [6]
   <fo:block>
    <xsl:apply-templates/>
   </fo:block>
  </xsl:template>


  <xsl:template match="head">                  [7]
   <fo:block>
      <xsl:apply-templates/>
    </fo:block>
  </xsl:template>

  <xsl:template match="para">                  [8]
    <fo:block>
     <xsl:apply-templates/>
    </fo:block>
  </xsl:template>

  <xsl:template match="em">                    [9]
   <fo:inline font-style="italic">
     <xsl:apply-templates/>
   </fo:inline>
  </xsl:template>


<xsl:template match="*">                       [10]
  <fo:block background-color="red">
   <xsl:apply-templates/>
  </fo:block>
 </xsl:template>

</xsl:stylesheet>

The Source Document

Before explaining the structure, the source document for which we are designing this stylesheet should be mentioned. I'm assuming a feed from a document class which has 4 elements, with the structure as shown below. I've kept it simple because it represents the vast majority of XML content meant for an XSL-FO document. It contains only two block items (head and para) and a single inline item (em).

Our document is contained in an outer document element, and a mix of head and para elements which contain some emphasis:

<document>
   <head>My very first xsl-fo document</head>
   <para>has an <em>important</em>  paragraph inside it</para>
</document>

A page size is specified at [1], using European sizes. Change these to your local paper size if it's different. I've added margins since content which extends to the edges of the page is unsightly.

At [2] I've added a top margin to the main region of the page. [3] and [4] are as before. At [5] we have a crucial difference: at this point, where previously I simply said "content", I now use the facilities of XSLT to instruct the XSLT engine to process the input document. At [6] the XSLT engine processes the document element of the input XML file by outputting an fo:block element, inside which all remaining content is placed. Since blocks can be nested quite happily in XSL-FO this isn't a problem. What it does do is ensure that any content which leaks -- that is, isn't handled explicitly by the stylesheet -- is still in a block.

At [7], [8], and [9] I'm back in the normal world of XML and XSLT. Matching a source document element and outputting an appropriate element from the XSL-FO vocabulary. The first two are identical and just need decorating, the latter is slightly different in that it is an inline formatting object and produces italic output.

[10] is a catch-all to show (in the output) which elements, if any, are not styled. Once styling is applied to all elements nothing will be processed by this template. It's good as a debugging option during development.

This stylesheet introduces two new elements. The first is the fo:block element, used for many elements in the stylesheet. This is the basic layout element which is used to wrap content; think of it as a p element in HTML.

The fo:inline element is a container for inline elements in XSL-FO. Each of these two elements has a whole range of properties, expressed syntactically as attributes, which are used to decorate the content that they wrap.

Starting New Pages

Let's extend the source document structure to include a section which should have a new page start point. So now the document might look like this:

<document>
  <section>
   <head>My very first xsl-fo document</head>
   <para>has an <em>important</em> paragraph inside it</para>
  </section>
  <section>
    <head>The second section, starting on a new page </head>
   <para>Some content in the second section</para>
  </section>
</document>

Now I need to style this addition, using one of the available properties of a block.

<xsl:template match="section">
  <fo:block break-before="page">
    <xsl:apply-templates/>
  </fo:block>
</xsl:template>

This tells the XSL-FO formatting engine to create a new page when it hits a section. All the content of that section is processed within that block. To make the head element stand out, I'll also improve the appearance by choosing a larger, bold font size and by adding a little space after the content.

<xsl:template match="head">        
   <fo:block    font-size="14pt"
		font-weight="bold"
                space-after="1cm"	
                space-after.conditionality = 'retain'
		>
      <xsl:apply-templates/>
   </fo:block>
</xsl:template>

That's it. To review: processing is a two stage process at its simplest. Give your source document and the above XSLT stylesheet to an XSLT processor, and the output should be a valid XSL-FO document. This can then be fed to an XSL-FO engine -- RenderX or Antenna House (both commercial, with trial options) or to PassiveTeX or FOP (non-commercial offerings).

You can download the files developed in this article here: xsl-fo-assets.zip.

Related Reading