March 26, 2003
It's been a slow month on the O'Reilly Network XML Forum: it's not easy to write a column answering questions when none have been asked. So, it's time to dig into the archives, pulling out -- this time -- a couple of queries about processing XML.
XML for <SCRIPT>
Here's what the XML for <SCRIPT> site has to say about it:
The benefits of this architecture are many.
- Server side code intermixed with HTML code can be reduced to almost nothing.
- Client side code is simplified by having all form initialization in one place.
- Applications are now free to maintain their own data, reducing annoying round-trips to the server.
- Server side processing is simplified by having all relevant form data be submitted in XML
In effect, XML for <SCRIPT> allows n-tier client side application development to become a reality.
ESPX, in Jandia's words, is "an ECMAScript Parser for (almost) XML, with namespaces". The "almost" refers to the fact that ESPX doesn't support DTDs (either internal or external subset), let alone XML Schema. This may or may not be a fatal limitation; for instance, if you need to recognize ID-type attributes, as such, or to use declared entity references, you're out of luck. On the other hand, ESPX does support quite a few of HTML 4.0's built-in entity references. As Jandia's summary implies, it also fully supports the W3C's Namespaces in XML Recommendation.
Importantly for cross-platform applications, ESPX has been tested on the three main browsers (Microsoft Internet Explorer, Netscape/Mozilla, and Opera) not only at their current levels -- which (to varying degrees) already "know" XML -- but also in "down-level" versions "without built-in XML support".
Neither of the above two projects has seen any really recent updates. XML for <SCRIPT> was last updated about a year ago. The current ESPX/TinyXSL version is date-stamped March of 2001.
Remember, whether you select either XML for <SCRIPT> or ESPX/TinyXSL -- or probably
particularly "programmable". There's no such thing as a built-in
element, for one obvious example; even if a particular vocabulary does include such,
means depends entirely on the vocabulary's purpose. (For instance, vocabularies intended
use in marking up dramatic works and in handwriting analysis might both include a
script element. It probably would be used in neither case to hold programming
Q: What are the processing steps an XSL-FO engine follows?
I have read the XSL-FO specification. There they have said that XSL-FO formatting includes three steps:
- Generating area tree
I am unable to understand these things clearly. Can you please explain them, with an example?
A: Congratulations on having read the XSL-FO Recommendation. Just embarking on that task had to require an act of almost unimaginable willpower! There's no real mystery to the three concepts you've singled out for your question. Let's look at them one at a time.
This is similar to what a DOM-based XML parser does: it converts a stream of XML data into an in-memory tree. Specifically, it constructs what's called a formatting object tree -- essentially a hierarchy of boxes or containers within which the document's actual content appears.
For instance, the skeleton of a simple XSL-FO document might look something like this:
<fo:root [attributes]> <fo:layout-master-set [attributes]> <fo:simple-page-master [attributes]> <fo:region-body [attributes]>...</fo:region-body> <fo:region-before [attributes]>...</fo:region-before> <fo:region-after [attributes]>...</fo:region-after> <fo:region-start [attributes]>...</fo:region-start> <fo:region-end [attributes]>...</fo:region-end> </fo:simple-page-master> <fo:page-sequence-master [attributes]>...</fo:page-sequence-master> <fo:layout-master set [attributes]> <fo:page-sequence [attributes]> <fo:title [attributes]>...</fo:title> <fo:static-content [attributes]>...</fo:static-content> <fo:flow [attributes]>...</fo:flow> </fo:page-sequence> </fo:root>
To objectify this stream of XML, the formatter converts it to a tree of objects -- of formatting objects, as shown below:
Note that at this point, all that exists is only a rough in-memory metaphor (as it were) for how the final document will appear.
(The various elements' attributes and text content are also included in this tree inside the corresponding box, although not shown above.)
The idea behind the refinement step is that when the final document is produced, each
formatting object (FO) will have traits which instruct the rendering agent exactly
where to display that FO. For instance, a block of text in a top margin (which corresponds
fo:region-before FO) might be rendered in a particular font face,
centered horizontally between the margins. These traits are often specified explicitly
the attributes for a given FO's corresponding element, and direct mapping of attributes
traits is one part of refinement.
(Aside: the XSL-FO Recommendation uses the terms "trait" and "property" more or less interchangeably. Perhaps there's some distinction between the terms in the spec's authors' minds, but for all practical purposes you can consider them synonymous.)
Traits can be implied as well as expressed explicitly, however. For example, many
are inherited by lower-level FOs from their higher-level ancestors. Some traits must
calculated based on evaluating expressions. And some traits (such as a simple
border trait) are shorthand expressions of various specific traits (such as
border-style). Deriving traits from these
indirect sources is another (very important) facet of the refinement step.
Also in XML Q&A
Generating the area tree
The final step in XSL-FO processing is the one which produces the result you're really after when using XSL-FO in the first place: it assigns a geometric area on each printed page for each block of content, according to the specifications laid out in the fully-refined tree of FOs. It moves the abstract, metaphoric expression of the document's appearance to something which is actually usable by the target medium, be it printed page, computer monitor, WAP-enabled cell phone, or whatever.
If you're interested in learning more about XSL-FO -- a big but (I think) important topic -- I encourage you to consult more full-length treatments such as Dave Pawson's XSL-FO or my own Just XSL. (Note that the latter includes full coverage of XSLT as well as XSL-FO.)