
Transforming Experiences
Q: Can I convert XSL Formatting Objects (XSL-FO) to HTML?
I have an XSL-FO file that renders a PDF using FOP. Can I use the same XSL-FO file to render to XHTML? If yes, what is the tool to use?
A: Let's tackle the second question first: one good place to start is the fo2html XSLT stylesheet developed by RenderX. Using an XSL-FO file, the stylesheet converts XSL formatting objects to more or less equivalent XHTML, using CSS1 for formatting. Pay particular attention on that page to the limitations of this approach.
RenderX is the developer of the XEP XSL-FO engine, nominally a competitor to Apache's FOP, and this stylesheet is "an add-on" to that product. But the stylesheet should work equally well with XSL-FO input from any source. Also, although the RenderX page above focuses on using the stylesheet directly within an XSLT-aware Web browser, I can't think of any reason why it shouldn't be easily adaptable to use by any XSLT engine, including Saxon, Cocoon, or what-have-you.
Now let's linger over the longer answer to the question of what's involved in converting XSL-FO to XHTML in the first place.
XSL-FO is an XML vocabulary whose elements and attributes describe the structure of a printed document. The requirements for this kind of output differ in just about every substantive way from those for output intended for Web browsing, i.e. XHTML. A Web site's content is typically scattered across multiple discrete documents, and there's no regard in most cases for printing requirements:
- there's no way to require that the Web page be printed on a certain size paper;
- a given section heading might appear at the bottom of a page, while the corresponding content begins at the top of the next;
- font families and sizes and page margins are ultimately under the user's control;
- images can be suppressed at will;
- page headers and footers are defaulted by the browser and may be overridden or suppressed altogether, and how those headers and footers look is entirely outside of the page designer's hands;
- supporting material, such as footnotes and indexes, can be constructed -- but only awkwardly;
- if a table crosses a printed page's boundary, onto the next page, its header cells won't be repeated....
The list of differences, in short, is enormous.
Even so, there are some similarities. The fo:block formatting
object, for instance, is analogous to XHTML's div. The
fo:table formatting object behaves like XHTML's table
element. XHTML's ul and li unordered-listing elements
have counterparts in XSL-FO's fo:list-block and
fo:list-item. And so on.
The general process of automatically converting XSL formatting objects to their XHTML equivalents depends on making point-for-point transformations from one XML vocabulary (XSL-FO) to another (XHTML). Thus, it's a natural task for XSLT; hence RenderX's fo2html stylesheet solution. Things become dicey, though, when it comes to handling those printed-page features with no precise Web-page counterparts or, worse, no counterparts at all. You can turn to CSS for help with some of them. But you'll never get an entirely one-to-one correspondence between XSL-FO, or any other fully-featured page description language, and XHTML itself, even with CSS.
Note, by the way, one perhaps overly fussy drawback to RenderX's solution:
the CSS style instructions which it generates appear as style
attributes assigned to div elements. For better control and
consistency, one would prefer a solution which generated two outputs from the
single XSL-FO document: an XHTML document and a separate, optional CSS
stylesheet. That RenderX hasn't done so isn't because it lacks technical
acumen. Rather, it's because
|
|
| Post your comments |
- their experiment with fo2html is directed at using a browser to view the XSL-FO document directly, transformed on the fly into XHTML; and
- an XSLT 1.0 stylesheet can't, in any case, generate more than one result
tree (that is, "output" document) at a time. (XSLT 2.0, when finalized, will probably
permit the creation of one or more "secondary result trees," using an
instruction called
xsl:result-document.)
Q: How do I use XSLT to put a "tag" inside an attribute?
In my XML document there is a tag called filmid. As an
example:
<filmid>1</filmid>
In the XSLT stylesheet I want to turn this into a filename for an
<img src> tag, like this:
<img src="images/<xsl:value-of
select="filmid"/>.gif" />
This doesn't work, though.
A: You're probably not thinking of it in these terms, but the problem isn't a limitation of XSLT. The problem is a limitation in how you're thinking about XSLT. Don't be embarrassed; you've got a lot of company at this particular roadblock.
Here's the heart of the matter: XSLT does not write or create tags in an output file; if anything, it creates elements (and attributes, PIs, and so on) in a result tree. That is, a stylesheet does not consist of a series of instructions such as:
- Write the start of an
imgtag. - If you (the XSLT processor) find an XSLT instruction (like
xsl:value-of), no matter what else you're doing, stop writing the tag and do what the instruction says. - Write the end of the
imgtag.
Instead, what the stylesheet consists of is a series of instructions to an XSLT engine along these lines:
- Instantiate (create an instance of) an
imgelement. - Instantiate a
srcattribute for that element.
Note that there's no mention of the word "tag". There's also no reference to writing or creating the "start" or "end" of an element, as a tag or as anything else. In fact, exactly what the XSLT engine produces from these instructions needn't resemble markup at all. It's not that hard to imagine an XSLT-aware package which uses the stylesheet's instructions to generate a graphical tree diagram of a document's structure, either as an SVG document or as a JPEG, PostScript, TIF, or PNG.
Also, remember that even before the stylesheet gets submitted to the XSLT engine itself, it has to get past the gatekeeper, an XML parser. Thus, regardless of how commonsensical a stylesheet full of "instructions" might be, if the stylesheet itself is not at least well-formed XML, it won't make any difference whether or not the instructions are legitimate XSLT.
For the record, the fragment of XSLT code you've provided will fail any
self-respecting parser's admission requirements on at least one and almost
certainly two counts. First, the parser will attempt and fail to understand the
value of the src attribute as a string, enclosed in double
quotation marks, which itself contains double quotation marks. To rectify this,
you could change the double quotation marks surrounding "filmid" to single ones,
like this:
<img src="images/<xsl:value-of
select='filmid'/>.gif" />
Alternatively, you could leave the double quotes around "filmid" and replace the ones around the attribute as a whole with singles.
|
Also in XML Q&A | |
The parser should also choke on this "solution," though, because an
attribute's value may not contain an unescaped less-than (<)
character. But let's imagine you're using a very forgiving (and non-compliant)
parser, and it blithely accepts what you've passed it as the value of the
src attribute. What are you going to get in the stylesheet's result tree?
You're going to get an img element with a src
attribute whose value is literally everything within the double quotation
marks. The XSLT engine, that is, wouldn't realize you'd tried to feed it an
xsl:value-of instruction. It would simply decide that the value of
this src attribute is a string consisting of an i,
followed by an a, and so on through the first slash, followed by a
less-than character, followed by an x, followed by an
s and an l and a colon, and so on.
In fact, that's how an XSLT processor will behave if you escape the less-than
with an entity reference, like ".
There are two ways around the problem. The simplest, at least in this case,
is to forget the xsl:value-of element and use a so-called attribute
value template (commonly abbreviated AVT) instead. An AVT consists of an XPath
expression enclosed in "curly braces" ({ and }
characters). If the expression is a relative location path, it's relative to
whatever the context at that point is. For instance,
<img src="images/{filmid}.gif"
/>
An XSLT engine will respond to the presence of the curly braces as
meaning, "insert the string-value of the context node's filmid
child element at this point".
The other alternative is a bit wordier but is, in the opinion of many XSLT
programmers, much clearer and not quite so much like arcane programmerese. (It
also works in some cases where an AVT can't be used for one reason or another.)
This alternative uses the xsl:attribute element to add an
attribute to the result tree. The string-value of this element, possibly
including an xsl:value-of, becomes the value of the attribute in
question. Thus:
<img><xsl:attribute
name="src">images/<xsl:value-of
select='filmid'/>.gif</xsl:attribute></img>
The attribute gets appended to the result tree for the immediately preceding
element (img, in this case). The name of the attribute is assigned
by the name attribute, obviously. And, as you can see, the
attribute's value can consist of a mixture of literal text and XSLT
instructions.
Whether you use an AVT or an xsl:attribute element, the
corresponding portion of the result looks like either of the following:
<img src="images/1.gif" />
<img src="images/1.gif"></img>
Either of these results is a perfectly well-formed way to express an empty
img element.
- The common XSLT image mistake
2002-07-17 01:33:31 haydn flower - Abstract Out the Document
2002-05-31 06:25:15 Steve Afdahl