Using XSL Formatting Objects
January 17, 2001
Table of Contents
The World Wide Web Consortium's specification for Extensible Stylesheet Language (XSL) comes in two parts:
- XSLT, a language for transforming XML documents, and
- XSL Formatting Objects (XSL FO), an XML vocabulary for specifying formatting semantics.
XSLT is easy to learn and use. With only a modest investment of time, developers can convert an XML file to an HTML file that users can display in their browsers. This explains why developers are greeting XSLT with great enthusiasm. XML.com's column "Transforming XML" is a great place to get started working with XSLT.
XSL Formatting Objects is itself an XML-based markup language that lets you specify in great detail the pagination, layout, and styling information that will be applied to your content. The XSL FO markup is quite complex. It is also verbose; virtually the only practical way to produce an XSL FO file is to use XSLT to produce a source document. Finally, once you have this XSL FO file, you need some way to render it to an output medium. There are few tools available to do this final step. For these reasons, XML FO has not caught on as quickly as XSLT.
Rather than explain XSL FO in its entirety, this article will give you enough information to use the major features of XSL FO. Our case study will be a short review handbook of Spanish that will be printed as an insert for a Spanish language learning CD-ROM. We'll use the Apache Software Foundation's FOP tool to convert the FO file to a PDF file.
Since the XSL FO file will be an XML document, it must begin with the standard XML processing instruction and the FO root element.
<?xml version="1.0" encoding="utf-8"?> <fo:root>
The structure of the remainder of the document is:
- The layout master set, which consists of
- Descriptions of the kinds of pages that can occur in the document.
- Sequences in which those page formats can occur.
- The pages and their content.
After the FO document's beginning <fo:root> tag, we have to describe what kinds of pages our document can have. Our document will have three kinds of pages shown in the diagram below. To accommodate the stapling area, the cover page and right-hand pages will have more margin space at the left. The content pages will also have a region for a header and footer.
Let's start out by specifying the page widths and heights and margins. The units below are all in centimeters, but you may use any of the CSS units, such as px (pixel), pt (point), em, in, mm, etc. Each of these specifications is called a simple-page-master and must be given a master-name so you can refer to it later.
<fo:layout-master-set> <fo:simple-page-master master-name="cover" page-height="12cm" page-width="12cm" margin-top="0.5cm" margin-bottom="0.5cm" margin-left="1cm" margin-right="0.5cm"> </fo:simple-page-master> <fo:simple-page-master master-name="leftPage" page-height="12cm" page-width="12cm" margin-left="0.5cm" margin-right="1cm" margin-top="0.5cm" margin-bottom="0.5cm"> </fo:simple-page-master> <fo:simple-page-master master-name="rightPage" page-height="12cm" page-width="12cm" margin-left="1cm" margin-right="0.5cm" margin-top="0.5cm" margin-bottom="0.5cm"> </fo:simple-page-master> <!-- more info will go here --> </fo:layout-master-set>
The margins are areas which will not contain any printed output.
All of the printing occurs within the dotted lines in the diagram above. This is the page content area (officially called the page-reference-area), which can be divided into five regions as shown below.
Before continuing, we have to take a side trip to explain some terminology. When we set margins, we use words like top, bottom, left, and right. because everyone agrees which edge of a piece of paper is the top edge, left edge, etc. We will use different words when we talk about the content area, because not all languages are written left-to-right, top-to-bottom.
FO considers a page to be made up of two classes of elements: block elements (such as paragraphs) which begin on a new line, and inline elements (such as bold, italic) which don't. You can think of FO's block-progress-direction as the order in which paragraphs are placed on a page. The before-edge precedes a paragraph, the after-edge follows it.
The inline-progress-direction is the order in which characters are placed within a line. The start-edge precedes a line, and the end-edge follows it.
For Hebrew, as shown below, the start- and end- edges are the opposite of those used for English. (Arabic is written similarly.)
Japanese is sometimes written as shown below. The picture is from the XSL specification.
The advantage of using this new vocabulary is that it is language-independent. If you want a heading to be at the opposite side of the page from normal text, you set its text-align="end" so it appears like
An interesting heading
Headings set like the one above are unusual, and thus more likely to catch a reader's attention.
If the document is later translated to Arabic or Japanese, you will be assured that the heading will still appear at the corresponding “opposite side” of the text. There will be no need to go through your document reversing left and right or switching them with top and bottom.
The cover page doesn't need a header or footer, so we need only specify information for the region-body by adding the information shown in bold below.
<fo:simple-page-master master-name="cover" page-height="12cm" page-width="12cm" margin-top="0.5cm" margin-bottom="0.5cm" margin-left="1cm" margin-right="0.5cm"> <fo:region-body margin-top="3cm" /> </fo:simple-page-master>
The left and right pages will have a header and footer, so we must specify the extent of the region-before and region-after.
<fo:simple-page-master master-name="leftPage" page-height="12cm" page-width="12cm" margin-left="0.5cm" margin-right="1cm" margin-top="0.5cm" margin-bottom="0.5cm"> <fo:region-before extent="1cm"/> <fo:region-after extent="1cm"/> <fo:region-body margin-top="1.1cm" margin-bottom="1.1cm" /> </fo:simple-page-master> <fo:simple-page-master master-name="rightPage" page-height="12cm" page-width="12cm" margin-left="1cm" margin-right="0.5cm" margin-top="0.5cm" margin-bottom="0.5cm"> <fo:region-before extent="1cm"/> <fo:region-after extent="1cm"/> <fo:region-body margin-top="1.1cm" margin-bottom="1.1cm" /> </fo:simple-page-master>
Important: The margins you set for the region-body must be greater than or equal to the extents of the the region-before and region-after (and the region-start and region-end if you use them - FOP does not currently support them.). If you do something like this:
<fo:region-before extent="1cm"/> <fo:region-after extent="1cm"/> <fo:region-body margin-top="0.20cm" margin-bottom="0.20cm" />
you can expect results like
Now that the page masters are defined, you may specify the the order in which a given set of these page masters will be used when it's time to generate a sequence of pages.
The document we're building consists of a cover followed by the contents. That is, there are two sequences of pages: the cover page (which happens to be a sequence of exactly one page), followed by the "contents pages", which is a sequence of alternating left and right pages.
While it's possible to define a page sequence that consists of the page master for the cover alone, we don't gain anything by doing so. (If we had several pages of front matter, as many books do, it would definitely be worth the effort.) Instead, we will concentrate on defining the sequence of master pages for the contents of the book. In plain English, the contents of the book consist of even-numbered left-hand pages followed by odd-numbered right-hand pages. This means that the inside front cover will be page two. The specification is shown below, with line numbers added for reference.
1 <fo:page-sequence-master master-reference="contents"> 2 <fo:repeatable-page-master-alternatives> 3 <fo:conditional-page-master-reference 4 master-name="leftPage" 5 odd-or-even="even"/> 6 <fo:conditional-page-master-reference 7 master-name="rightPage" 8 odd-or-even="odd"/> 9 </fo:repeatable-page-master-alternatives> 10 </fo:page-sequence-master>
- Line 1
- Define and name this page sequence master.
- Line 2
- This sequence consists of page masters that should be chosen repeatedly according to the specified conditions as pages are generated.
- Lines 3-5
- Choose the page master named leftPage if the page being generated has an even page number.
- Lines 6-8
- Choose the page master named rightPage if the page being generated has an odd page number.
While this is probably the most common page sequence, others are possible. If you had a single-sided document where all the pages looked like a right-hand page, but you wanted to set a maximum number of pages, you would use a page-sequence master as follows:<fo:page-sequence-master master-reference="example"> <fo:repeatable-page-master-reference maximum-repeats="10" master-name="rightPage"/> </fo:page-sequence-master>
The maximum-repeats attribute can also be applied to repeatable-page-master-alternatives.
You may specify a maximum-repeats attribute to limit the number of pages that this sequence can generate. The maximum-repeats also applies to repeatable-page-master-alternatives.
Other conditions that you may use in a conditional-page-master-reference are
page-position Use this page depending upon where it occurs in the page-sequence. Valid values are first, last, rest (i.e., not the first page), or any. blank-or-not-blank Use this page master depending upon whether the page is blank or not. Valid values are blank and not-blank. The blank value is used to maintain parity; for example, to generate a blank page to ensure that a chapter always ends on an odd page number.
Now that the page masters and sequences are established, you can start putting content into those pages. This is done by specifying which page sequence to use, and which region the information should flow into. Here's the beginning of the cover page. We use the numeric entity code © for the copyright symbol.
1 <fo:page-sequence master-name="cover"> 2 <fo:flow flow-name="xsl-region-body"> 3 <fo:block font-family="Helvetica" font-size="18pt" 4 text-align="end"> 5 Spanish Review Handbook 6 </fo:block> 7 <fo:block font-family="Helvetica" font-size="12pt" 8 text-align="end" space-after="36pt"> 9 Copyright © 2001 J. David Eisenberg 10 </fo:block> 11 <fo:block text-align="end"> 12 A Catcode Production 13 </fo:block> 14 </fo:flow> 15 </fo:page-sequence>
- Line 1
- Specifies the page sequence into which this content will flow. Note: it's easy to confuse this with <fo:page-sequence-master>; the word master is part of the attribute name, not part of the element name!
- Line 2
- The following content goes into the xsl-region-body area of the page.
- Lines 3-6
- This content (Spanish Review Handbook) should begin on a new line (<fo:block>) with the specified font family and size. Note the text-align is at the end edge of the line.
- Lines 7-10
- Another block for the copyright message, using a different font size. Put some empty space-after this block is put into the flow.
- Lines 13-14
- Another block with publisher information.
- Lines 14-15
- That's the end of the content for this page.
Now that we have some content, we can render this page to print. If you'd like to try this yourself, download the Apache Software Foundation's FOP tool and install it according to the instructions you find there. You will need
- Java 1.1.x or later;
- an XML parser which supports SAX and DOM;
- for later articles in this series, you'll also need an XSLT parser (f you download Xalan, you'll get both Xerces, the XML parser, and Xalan, the XSLT parser); and,
- an SVG library, which is in the w3c.jar file that comes with FOP.
For example, on a Linux system, you could put all the .jar files in a convenient directory and create a script named fop.sh that looks like
java -cp \ /usr/local/xml-jar/fop.jar:/usr/local/xml-jar/w3c.jar:\ /usr/local/xml-jar/xml.jar:/usr/local/xml-jar/xerces.jar:\ /usr/local/xml-jar/xalan.jar:/usr/local/xml-jar/bsf.jar \ org.apache.fop.apps.CommandLine $1 $2
Invoking the script by typing fop.sh spanish1.fo spanish1.pdf produces a PDF file. To view the file, you need a PDF viewer; Adobe Acrobat Reader works on Linux, Macintosh, and Windows. Linux users may also use xpdf, an X-Window PDF viewer. The output from the document so far is shown below in a reduced view.
This obviously cries out for a graphic to make it look better. The graphic is added as an external-graphic whose src attribute is a valid URI for the image. The additional elements are shown in bold below.
<fo:block font-family="Helvetica" font-size="12pt" text-align="end" space-after="36pt"> Copyright #169; 2001 J. David Eisenberg </fo:block> <fo:block text-align="end"> <fo:external-graphic src="file:images/catcode_logo.jpg" width="99px" height="109px"/> </fo:block> <fo:block> A Catcode Production </fo:block>
There. That's much nicer, isn't it?
Before we leave this article, we'll start the content pages. In this case, we have to put information into the xsl-region-before and xsl-region-after as well as the xsl-region-body.
1 <fo:page-sequence master-name="contents" initial-page-number="2"> 2 <fo:static-content flow-name="xsl-region-before"> 3 <fo:block font-family="Helvetica" font-size="10pt" 4 text-align="center"> 5 Spanish Review Handbook 6 </fo:block> 7 </fo:static-content> 8 9 <fo:static-content flow-name="xsl-region-after"> 10 <fo:block font-family="Helvetica" font-size="10pt" 11 text-align="center"> 12 Página <fo:page-number /> 13 </fo:block> 14 </fo:static-content> 15 16 <fo:flow flow-name="xsl-region-body"> 17 <fo:block font-size="14pt"> 18 Watch this space! 19 </fo:block> 20 </fo:flow> 21 </fo:page-sequence>
- Line 1
- Start a new page sequence using the sequence defined by the contents master name. Start page numbers at 2.
- Lines 2-7
- As currently configured, the FO to PDF converter requires the content of the header area to be the same on all pages; thus you must specify <fo:static-content> rather than a variable <fo:flow> to fill the xsl-region-before.
- Lines 9-14
- Footer areas must also have <fo:static-content>. NOTE: Line 12 shows how to insert the current <fo:page-number/>. Entity á represents á.
- Lines 16-20
- Specify the content to fill in the xsl-region-body in this page sequence.
Here's the result:
In the next article, we'll show you how to use XSLT to make it much easier to create the FO elements. You'll also learn how to put lists and tables into your documents.