What Is XSL-FO
by G. Ken Holman
|
Pages: 1, 2, 3, 4, 5, 6, 7
2. Basic concepts of XSLFO
Here we review basic aspects of the XSLFO semantics and vocabulary, to gain a better understanding of how the technology works and how to use the specification itself.
Layout-based vs. content-based formatting
Two very different approaches to the formatting of information are contrasted. Layout-based formatting respects the constraints of the target medium, where limitations or capacities of the target may constrain the content or appearance of the information on a page. Content-based formatting respects the quantity and identity of the information, where as much of the target medium is generated to accommodate the information being formatted.
Formatting is different than rendering
The distinction between formatting and rendering is overviewed, comparing how to express what you want formatted vs. expressing how it is to be accomplished on the target device. This contrast is similar to the difference between declarative- and imperative- style programming methods, or the difference between XSLT's "transformation by example" paradigm vs. other algorithmic transformation approaches using programming languages.
Formatting model and vocabulary extends what is currently available for web presentation
The XSLFO semantics and vocabulary address different requirements than infinite-length web user agent windows to meet the needs of imposed arbitrary page boundaries on the presentation of information. These new semantics are inspired by the Document Style Semantics and Specification Language (DSSSL) International Standard ISO/IEC 10179, but in practice diverge from DSSSL towards Cascading Style Sheets 2 (CSS2) for compatibility with web-based processing.
The semantics are classified based on their relationship to similar CSS properties:
- CSS properties by copy (unchanged CSS2 semantics)
- CSS properties with extended values
- CSS properties "broken apart" to a finer granularity
- XSLFO-specific properties
The XSLFO support of multiple writing directions and a reference orientation are important concepts inherited from DSSSL that are not present in CSS2.
Differing processing model concepts are expressed using unambiguous terminology
The XSLFO specification, and this book as well, attempts to be very careful in using precise terminology when what is being referred to has similar concepts that could be confused with other constructs. For example, an XSLFO instance contains elements and their attributes. This is similar to the corresponding formatting object tree with objects and their properties. This is, in turn, similar to the corresponding refined formatting object tree with objects and their area traits. This is, finally, similar to the corresponding area tree with areas and their traits.
XSLFO objects related to basic issues
The XSLFO objects addressing functionality in this area are summarized as:
-
<root>(6.4.2)- the document element of the XSLFO instance
-
<layout-master-set>(6.4.6)- the collection of definitions of page geometries and page selection patterns
-
<page-sequence>(6.4.5)- the definition of information for a sequence of pages with common static information
-
<flow>(6.4.18)- the content that is flowed to as many pages as required
2.1 Basic concepts
2.1.1 Layout-based vs. content-based formatting
Layout-based formatting accommodates the medium being used to present information. The constraints of the medium, or the layout design of the graphic artist, often demands absolute positioning, column location specification, or page number specification. Consider that a magazine may need a particular columnist's article to appear on the right-had edge of page 7, while the three lead stories must be headlined within the first four pages.
This focus on layout places more emphasis on the appearance and location of information than the information itself, dictating the quantity and presentation of the content. Such layout is typically unstructured in both the authoring and the formatting processes, as typified by desktop publishing, journalism, etc.
Content-based formatting accommodates the information being presented with the available medium. The constraints of layout are expressed as rules associated with the information dictating how given information is to be positioned or presented. Consider that a single aircraft maintenance manual cannot have each of its 40,000 to 60,000 pages individually formatted.
This focus on information places more emphasis on the content and rules of layout, rather than on the medium, dictating the automatic layout and presentation of constructs found in the information stream. Such layout is typically highly structured in both the authoring and the formatting processes, as typified by technical publications found in pharmaceutical, aerospace, automotive, or other industries where either vast amounts of information are presented, or the information must be interchanged in a neutral form with other players.
XSLFO is more oriented to content-based formatting than layout-based formatting, though there do exist certain controls for the positioning, cropping, and flowing of information to particular areas of pages in page sequences. XSLFO can express the repetition of page geometries, mechanically accommodating the content as flowed by a transformation of the information into the formatting vocabulary. There is only limited support of the order of specific page sequences, and high-caliber copy-fitting requirements often cannot be met with mechanical unattended transformations.
Note that while XSLFO is not oriented to loose-leaf publishing, that does not prevent it perhaps from being used by a vendor to express the content of pages being maintained in a loose-leaf-based environment. A loose-leaf environment supports "change pages" (a.k.a. "A pages") through a database of effective pages and page contents.
XSLFO has no inherent maintenance facilities for past versions of individual pages, and no inherent support of lists of effective pages. Such facilities could be provided outside the scope of individual page presentations. XSLFO is more oriented to the unrestricted flowing of information to as much of the target medium is required to accommodate the content.
2.1.2 Formatting vs. Rendering
When creating XML we should be designing the structures around our business processes responsible for maintaining the information, instead of the structures used for presentation. An XSLFO instance describes the intent of how that stream of information is to be formatted in the target medium in a paginated fashion. This instance is typically generated by a stylesheet acting on the instance of XML information, rearranging and restructuring the information into the order and presentation desired.
This reordering takes the #PCDATA content and attribute content of the instance, repackaging it according to our intent based on our understanding of the semantics of the XSLFO vocabulary. We can reify this reordering as an intermediate file of syntax we can use for diagnostic purposes. We could also take the opportunity to store this reordering as an XML instance for "store and forward" strategies where the formatting takes place later or remotely from where the transformation takes place.
Unlike interactive formatting tools such as desktop publishing products or interactive formatting tools, there is no feedback loop from the XSLFO formatter to the stylesheet creating the XSLFO vocabulary. Therefore, the XSLFO information must be complete with respect to all desired behaviors of the formatter. Any special formatting cases or conditions can be accommodated through contingencies expressed in the XSLFO semantics.
The information arranged in the elements and attributes of our source vocabularies is repackaged into the elements and attributes of the XSLFO formatting vocabulary that express the formatting objects and their properties of the XSLFO semantics. Each formatting object specifies an aspect of either layout, appearance and impartation, or the pagination and flow.
The layout semantics express the intent of locating information positioned on the target medium. Areas of content are specified as located and nested within other areas, in a hierarchical tree of rectangles on each page.
The appearance and impartation semantics express the intent of how the information is to be conveyed to the reader. For visual media, this conveyance includes font, size, color, weight, etc. For aural synthesis, this conveyance includes voice, volume, azimuth, pitch, etc.
The pagination and flow semantics express the intent of how the stream of information being presented is to be parceled within the layout areas. The final pagination is the result of accommodating the amount of flow being presented within the areas that have been defined.
Each of the formatting objects is expressed in an XSLFO instance as an element. It is not necessary to know all formatting objects to get effective formatted results.
An XSLFO formatter is responsible for interpreting the intent to be rendered, as expressed in the XSLFO semantics corresponding to the elements and attributes in the instance created by the stylesheet. Following the Recommendation, the formatter determines what is to be rendered where by interpreting the interaction between formatting objects. How the formatter does this interpretation is defined in excruciating detail in the W3C Recommendation, as this document is written more for implementers than for stylesheet writers.
The properties expressed for each of the objects influence or are included in the structure of the resulting areas. Some of these properties are specifically targeted for certain media and are otherwise ignored by media for which they do not apply.
The Recommendation does not describe in detail the semantics of rendering. Any device-specific rendition is interpreted based on the semantics of the formatting objects that create the trees of areas and the traits found in those areas that are derived from the properties. How the rendering agent actually accomplishes the task of effecting the result of formatting to the target medium is entirely up to the agent, as long as it produces the same result as the intent described by the Recommendation.
The rendering, itself, may be a multiple-step process, producing the final form through a staged expression of rendering through interpretation on a given medium. For example, the rendering may require production of another intermediate formatting language such as TeX. Rendering may directly produce a final-form page description language such as the Portable Document Format (PDF), or the Standard Page Description Language (International Standard ISO/IEC 10180). The physical final form would then be produced from the intermediate form or final page representation. Indeed, there could be many steps to obtain a final result, e.g.: XML to XSLFO to TeX to PDF to paper.
This is a prose version of an excerpt from an edited version of the book
"Practical
Formatting Using XSLFO"
(First Edition ISBN 1-894049-07-1 at the time of this writing)
published by
Crane
Softwrights Ltd., written by
G. Ken Holman.
Copyright © Crane Softwrights Ltd.