What Is XSL-FO
by G. Ken Holman
|
Pages: 1, 2, 3, 4, 5, 6, 7
1. The context of XSLFO
This chapter reviews the roles of the following Recommendations in the XML family and an International Standard in the SGML family, and overviews contexts in which XSLFO is used.
Extensible Markup Language (XML)
We use XML to express information hierarchically in a sequence of characters according to a vocabulary of element types and their attributes. Using various Recommendations and other industry standards, we can formally describe the makeup and constraints of this vocabulary in different ways to validate the content against our desired document model.
Cascading Stylesheets (CSS)
Initially created for the rendering HTML documents in browsers, CSS formatting properties can ornament the document tree described by a sequence of markup following that specific SGML vocabulary. CSS was later revised to describe the ornamentation of XML documents so that CSS-aware browsers can render the information found in a decorated document tree described by any XML vocabulary. Browsers recognizing these properties can render the contents of the tree according to the semantics of the formatting model governing the property interpretation.
Document Style Semantics and Specification Language (DSSSL)
The International Organization for Standardization (ISO) standardized a collection of style semantics in DSSSL for formatting paginated information. DSSSL also includes a specification language for the transformation of Standard Generalized Markup Language (SGML) documents of any vocabulary, and implementations have since been modified to support the styling of XML documents of any vocabulary. This introduced the concept of a flow object tree comprising objects and properties reflecting the internationalized semantics of paginated output.
Extensible Stylesheet Language Family (XSLT/XSL/XSLFO)
Two vocabularies specified in separate W3C Recommendations provide for the two distinct styling processes of transforming and rendering XML instances.
The Extensible Stylesheet Language Transformations (XSLT) is a templating markup language used to express how a processor creates a transformed result from an instance of XML information.
The Extensible Stylesheet Language Formatting Objects (XSLFO) is a pagination markup language describing a rendering vocabulary capturing the semantics of formatting information for paginated presentation. Formally named Extensible Stylesheet Language (XSL), this Recommendation normatively incorporates the entire XSLT Recommendation by reference and, historically, used to be defined together in a single W3C draft Recommendation.
While XSLT is designed primarily for the kinds of transformation required for using XSL, it can also be used for arbitrary transformation requirements.
1.1.4 Styling structured information
Styling is transforming and formatting information
Styling is the rendering of information into a form suitable for consumption by a target audience. Because the audience can change for a given set of information, we often need to apply different styling for that information to obtain dissimilar renderings to meet the needs of each audience. Perhaps some information needs to be rearranged to make more sense for the reader. Perhaps some information needs to be highlighted differently to bring focus to key content.
It is important when we think about styling information to remember that two distinct processes are involved, not just one. First, we must transform the information from the organization used when it was created into the organization needed for consumption. Second, when rendering we must express the aspects of the appearance of the reorganized information, whatever the target medium.
Consider the flow of information as a streaming process where information is created upstream and processed or consumed downstream. Upstream, in the early stages, we should be expressing the information abstractly, thus preventing any early binding of concrete or final-form concepts. Midstream, or even downstream, we can exploit the information as long as it remains flexible and abstract. Late binding of the information to a final form can be based on the target use of the final product; by delaying this binding until late in the process, we preserve the original information for exploitation for other purposes along the way.
It is a common but misdirected practice to model information based on how you plan to use it downstream. It does not matter if your target is a presentation-oriented structure, for example, or a structure that is appropriate for another markup-based system. Modeling practice should focus on both the business reasons and inherent relationships existing in the semantics behind the information being described (as such the vocabularies are then content-oriented). For example, emphasized text is often confused with a particular format in which it is rendered. Where we could model information using a <b> element type for eventual rendering in a bold face, we would be better off modeling the information using an <emph> element type. In this way we capture the reason for marking up information (that it is emphasized from surrounding information), and we do not lock the downstream targets into only using a bold face for rendering.
Many times the midstream or downstream processes need only rearrange, re-label or synthesize the information for a target purpose and never apply any semantics of style for rendering purposes. Transformation tasks stand alone in such cases, meeting the processing needs without introducing rendering issues.
One caveat regarding modeling content-oriented information is that there are applications where the content-orientation is, indeed, presentation-oriented. Consider book publishing where the abstract content is based on presentational semantics. This is meaningful because there is no abstraction beyond the appearance or presentation of the content.
Consider the customer information in Example 1-1. A web user agent doesn't know how to render an element named <customer>. The HTML vocabulary used to render the customer information could be as follows:
01 <p>From: <i>(Customer Reference) <b>cust123</b></i> 02 </p> |
|
|
The rendering result would then be as in Figure 1.1, with the rendering user agent interpreting the markup for italics and boldface presentation semantics:
![]() |
|
|
The figure illustrates these two distinct styling steps: transforming the instance of the XML vocabulary into a new instance according to a vocabulary of rendering semantics; and formatting the instance of the rendering vocabulary in the user agent.
Two W3C Recommendations
To meet these two distinct processes in a detached (yet related) fashion, the W3C Working Group responsible for the Extensible Stylesheet Language (XSL) split the original drafts of their work into two separate Recommendations: one for transforming information and the other for paginating information.
The XSL Transformations (XSLT) Recommendation describes a vocabulary recognized by an XSLT processor to transform information from an organization in the source file into a different organization suitable for continued downstream processing.
The Extensible Stylesheet Language (XSL) Recommendation describes a vocabulary (often called XSLFO for "Formatting/flow Objects") reflecting the semantics of paginating a stream of information into individual pages. The XSLFO Recommendation normatively includes XSLT and historically both Recommendations were expressed in a single document.
Both XSLT and XSLFO are endorsed by members of WSSSL, an association of researchers and developers passionate about the application of markup technologies in today's information technology infrastructure.
This is a prose version of an excerpt from an edited version of the book
"Practical
Formatting Using XSLFO"
(First Edition ISBN 1-894049-07-1 at the time of this writing)
published by
Crane
Softwrights Ltd., written by
G. Ken Holman.
Copyright © Crane Softwrights Ltd.
