Listen Print Discuss

Printing XML: Why CSS Is Better than XSL

by Håkon Wium Lie, Michael Day
January 19, 2005

Longtime readers of XML.com will remember the battles between XSL and CSS that took place in these columns in 1999 and that were memorialized in XSL and CSS: One Year Later. Since then, the two languages have coexisted in relative peace: CSS is now used to style most web sites, XSLT (the transformation part of XSL) is used by many server-side, and XSL-FO (the formatting part of XSL) has found a niche in the printing industry.

A recent entry in the blog of a web luminary may signal the start of a second round of hostilities. Norman Walsh, a member of the W3C's Technical Architecture Group and co-author of the W3C's Web Architecture document (WebArch), recently blogged:

... web browsers suck at printing. ... And CSS is never going to fix it. Did you hear me? CSS is never going to fix it.

It's unclear if this statement is a prediction or a threat. Or just blogging on a bad day. Anyway, the pronounciation of CSS' printing ineptness gives us a splendid opportunity to explain why CSS is a better language than XSL for most printing needs. As we have just used CSS to style a 400-page book which will be published later this year (Cascading Stylesheets, designing for the web by Håkon Lie and Bert Bos, 3rd ed, forthcoming from Addison-Wesley, this year), this is not purely an academic excercise in stylesheet linguistics. So, would-be authors should continue reading.

The Problem

Both camps agree that a printed document is, in many ways, more difficult to format than on-screen presentation. A printed document must be split into numbered pages, with added headers and footers. Page margins must be specified, and they may be different on left and right pages. References that appear as hyperlinks on-screen often include page numbers on paper.

The disagreement starts with how best to express all this. Walsh's solution is to write a 1000-line XSL transformation that generates XSL-FO, which is subsequently turned into PDF. We will argue that it's much easier for most authors to express styling in CSS; in the case of the WebArch document, one can reuse the existing CSS stylesheets (200 lines or so) and add some print-specific lines. And, although browsers tend to focus on dynamic screens rather than on printing, products like Prince happily combine CSS with XML and produce beautiful PDF documents.

(Some disclosure at this point is appropriate. We, the authors, have been actively involved in shaping CSS and are now working hard to build software--Opera and Prince--that supports CSS.)

The Flavors

Before going into the print-specific features, let's compare the basic flavors of XSL and CSS. Consider this fragment from Walsh's XSL transform:


<xsl:template 
  match="html:p[@class='copyright' and ancestor::html:div[@class='head']]" 
  priority="100">
  <fo:block space-before="8pt"
            space-after="8pt"
            font-size="75%">
    <xsl:apply-templates/>
  </fo:block>
</xsl:template>

The purpose of this code is to select certain elements (specified in the match attribute) and to set certain formatting properties on these elements (e.g., font-size).

Using CSS, this can be written:


div.head p.copyright {
   margin-top: 8pt;
   margin-bottom: 8pt;
   font-size: 75%
}

Compare the two fragments. Which do you find more readable? Which language would be easier to learn?

Explaining this XSL snippet to a non-programmer would also be awkward:


<xsl:template match="html:ol/html:li">
  <fo:list-item>
    <xsl:if test="not(preceding-sibling::html:li)">
      <xsl:attribute name="keep-with-next">always</xsl:attribute>
    </xsl:if>

The CSS equivalent, however, is more intuitive:

 ol li:first-of-type { page-break-after: avoid } 

Printing with CSS

As we all know, simple tools cannot always perform advanced tasks. Even if CSS were able to simplify some fragments, it wouldn't do much good if the language had inherent limitations that made it impossible to describe advanced features. The question becomes, then, whether there are any inherent limitations in CSS that could make it unfit for producing printed documents.

The answer is no. CSS2, which became a W3C Recommendation in 1998, introduced the concept of pages in CSS. By using it, one can set page breaks (even Internet Explorer supports this) and page margins. More recently, a W3C Candidate Recommendation (called CSS3 Paged Media Module) added functionality to describe headers, footers, and more. Let's start with a simple example:

 @page { size: A4 portrait; } 

This simple statement tells the formatter that the resulting PDF document should be of size A4 (which is common outside North America), and that the orientation should be portrait. To change the size of the generated PDF document, one simply changes A4 into another size. Peeking inside the XSL sheet again, we find two 40-line switch statements to enable similar functionality. One of the statements is reprinted in full below for entertainment purposes:


<xsl:param name="page.height.portrait">
  <xsl:choose>
    <xsl:when test="$paper.type = 'A4landscape'">210mm</xsl:when>
    <xsl:when test="$paper.type = 'USletter'">11in</xsl:when>
    <xsl:when test="$paper.type = 'USlandscape'">8.5in</xsl:when>
    <xsl:when test="$paper.type = '4A0'">2378mm</xsl:when>
    <xsl:when test="$paper.type = '2A0'">1682mm</xsl:when>
    <xsl:when test="$paper.type = 'A0'">1189mm</xsl:when>
    <xsl:when test="$paper.type = 'A1'">841mm</xsl:when>
    <xsl:when test="$paper.type = 'A2'">594mm</xsl:when>
    <xsl:when test="$paper.type = 'A3'">420mm</xsl:when>
    <xsl:when test="$paper.type = 'A4'">297mm</xsl:when>
    <xsl:when test="$paper.type = 'A5'">210mm</xsl:when>
    <xsl:when test="$paper.type = 'A6'">148mm</xsl:when>
    <xsl:when test="$paper.type = 'A7'">105mm</xsl:when>
    <xsl:when test="$paper.type = 'A8'">74mm</xsl:when>
    <xsl:when test="$paper.type = 'A9'">52mm</xsl:when>
    <xsl:when test="$paper.type = 'A10'">37mm</xsl:when>
    <xsl:when test="$paper.type = 'B0'">1414mm</xsl:when>
    <xsl:when test="$paper.type = 'B1'">1000mm</xsl:when>
    <xsl:when test="$paper.type = 'B2'">707mm</xsl:when>
    <xsl:when test="$paper.type = 'B3'">500mm</xsl:when>
    <xsl:when test="$paper.type = 'B4'">353mm</xsl:when>
    <xsl:when test="$paper.type = 'B5'">250mm</xsl:when>
    <xsl:when test="$paper.type = 'B6'">176mm</xsl:when>
    <xsl:when test="$paper.type = 'B7'">125mm</xsl:when>
    <xsl:when test="$paper.type = 'B8'">88mm</xsl:when>
    <xsl:when test="$paper.type = 'B9'">62mm</xsl:when>
    <xsl:when test="$paper.type = 'B10'">44mm</xsl:when>
    <xsl:when test="$paper.type = 'C0'">1297mm</xsl:when>
    <xsl:when test="$paper.type = 'C1'">917mm</xsl:when>
    <xsl:when test="$paper.type = 'C2'">648mm</xsl:when>
    <xsl:when test="$paper.type = 'C3'">458mm</xsl:when>
    <xsl:when test="$paper.type = 'C4'">324mm</xsl:when>
    <xsl:when test="$paper.type = 'C5'">229mm</xsl:when>
    <xsl:when test="$paper.type = 'C6'">162mm</xsl:when>
    <xsl:when test="$paper.type = 'C7'">114mm</xsl:when>
    <xsl:when test="$paper.type = 'C8'">81mm</xsl:when>
    <xsl:when test="$paper.type = 'C9'">57mm</xsl:when>
    <xsl:when test="$paper.type = 'C10'">40mm</xsl:when>
    <xsl:otherwise>11in</xsl:otherwise>
  </xsl:choose>
</xsl:param>

As the alert reader will already have inferred, the statement lists the heights of many different paper sizes. As such, it is interesting reading. However, we do not understand why this list belongs in a stylesheet. CSS provides a simple and elegant alternative by naming the different sizes in the specification rather than in each stylesheet.

Another example that shows the elegant simplicity of CSS is that of page numbering. Page numbers are commonly printed on the outside of a page so that they are easily visible when flipping through a book. So, on a right page the page number should be on the right side, and on a left page it should be on the left side. On the first page, there should be no page number. In CSS, you can express this with:


@page :left {
  @bottom-left {
    content: counter(page);
  }
}
@page :right {
  @bottom-right {
    content: counter(page);
  }
}
@page :first {
  @bottom-right {
    content: normal;
  }
}

The statements, while not pure English prose, are easily understandable for anyone who has read this far, and it would be a simple exercise for the reader to move the page number from the bottom of each page to the top.

Because of size constraints, we're not going to show you how page numbers are expressed in XSL. We challenge you to find it and then try explaining it to the first person you meet.

Reuse and Cascading

One reason why the web took off in the early 90's was the manner in which HTML is authored. By looking at the source code of other documents, web authors could easily get started in web publishing. In a sense, HTML is the most successful open source movement. CSS also encourages reuse of code and has formalized how it works through the cascading rules. For authors, this means they can take an existing stylesheet and add to it their own rules instead of writing a new one themselves.

One case in point is how to express page breaks for printed documents. Typically, you want to avoid page breaks after headings, and this can be expressed by adding a simple rule:

 h1, h2, h3, h4, h5, h6 { page-break-after: avoid; } 

Here, the first line lists elements to which the second line applies. As a result, the formatter will avoid page breaks after these elements. XSL has no concept of cascading and cannot easily express the above example. Instead of grouping elements, one has to add a rule to each element's template. Here is what the template for h1 elements looks like:


<xsl:template match="html:h1">
  <fo:block space-before="0.25in"
        color="#00599C"
        font-size="16pt"
        font-family="{$title.font.family}"
        keep-with-next="always"
        id="{generate-id()}">

(XSL has chosen another name for the property, i.e., keep-with-next instead of page-break-after.)

Likewise, it is easy in CSS to remove text decorations (e.g. underlining) on all elements:

 * { text-decoration: none } 

Table of Contents

Many documents start with a table of contents (TOC). On-screen, the TOC is clickable and takes the user to the requested section. Paper, being more static in nature, needs references that can be followed manually. A TOC on paper, therefore, lists the number of the page where the section can be found.

Expressing this in CSS results in a slightly more complex rule than the examples you have seen so far. Consider this:

 ul.toc a:after { 
    content: target-counter(attr(href), page); }

In English, the rule would read as follows: inside ul elements of class toc, all a elements should be trailed (:after) by some generated content. The generated content is the page number where the target of the link is found. The link is expressed in the href attribute of the a element.

One reason for the added complexity is that CSS, contrary to a common misconception, has been designed to work with generic XML as well as HTML. In HTML, links are expressed in href attributes on a elements. In generic XML, however, links can be anywhere, and their position must be specified.

Another common feature of TOCs on paper is a dotted line between section titles and the respective page numbers. This is called a leader in typesetting terminology and can be expressed in CSS as follows:

 ul.toc a:after { 
  content: leader('.') target-counter(attr(href), page); }

Related Reading

Cascading Style Sheets: The Definitive Guide

Cascading Style Sheets: The Definitive Guide
By Eric A. Meyer

Table of Contents
Index
Sample Chapter

Read Online--Safari Search this book on Safari:
 

Code Fragments only

Compared with this three-line CSS solution, expressing TOCs in the WebArch XSL stylesheet takes more than 50 lines. In fairness, the XSL code also expresses other properties for TOCs (for example, that page breaks should be avoided). The CSS syntax in the above examples is still at the draft stage.

By combining the print- specific CSS stylesheet described above with the WebArch document, a nicely formatted PDF document can be created.

Multi-Column Layouts

On paper, content is often laid out in multiple columns. Stylesheets must be able to express this. Using CSS, one can easily create multi-column layouts:

body { column-count: 2; column-gap: 8mm; }

The content of the body element will now be poured into two columns, between which there is an 8mm gap. Multi-column layouts are also available in XSL, but the obligatory verbosity/complexity warnings apply.

Conclusions

So can CSS do everything better than XSL? Not quite. XSL is a Turing-complete language which, in principle, can be used for all programming tasks and is particularly suited for document transformations. Styling documents is only one of many things XSL can do. CSS, on the other hand, has been developed with only one task in mind: styling documents.

On the web, CSS is the style sheet language of choice. However, the usefulness of CSS is not limited to screens. If you want to transfer web content--be it XML or HTML--onto paper, there are good reasons to use CSS. The language is radically simpler than that of XSL, and it is suitable both on-screen and on paper. This means that you probably don't have to write a stylesheet at all but can reuse an existing one.

Finally, by using CSS you can preserve the semantics of your content all the way to the printer. That, however, is a different discussion.


Comment on this articleShare your experience in our forums.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • nice topic
    2008-03-25 07:51:05 prashantnt [Reply]

    The article is quite helpful i tried every thing but i was not able to remove the header and footer that is set by default. I want to remove the page count, document url and datetime from the document that is getting printed.


    Kindly help me out on this.
    you can also mail me on prashant@ceiltechnologies.com

  • latex
    2006-12-26 10:44:15 kettle [Reply]

    The central issue in this article seems to be 'how best to format documents for printing'. XSLT and CSS are not the only available options; what about latex? Why not just use latex?

    • latex
      2006-12-26 15:43:43 mikeday [Reply]

      Well, this is XML.com, so the assumption is that your source document is written in XML. You could potentially transform that document into LaTeX for formatting, which would require XSLT or scripting, but that can be a fiddly job due to the impedance mismatch between the two formats; for example, LaTeX is quite sensitive to whitespace, while many XML vocabularies are not, and collapse multiple spaces into one. I think that for most documents, using CSS for styling/printing would be easier than transforming to LaTeX.

  • But can CSS support the finer points of typography?
    2005-08-24 18:16:02 Hedley Finger [Reply]

    The examples of CSS's simplicity v. XSL-FO's complexity only partly swing me to the CSS side. Sure, you can specify leader lines, etc. in a TOC entry. But those dots hard up against both the heading and the page number look pretty crappy. Can CSS do this:


    * Chapter headings in the font/size/colour/etc. of one's choice
    * Subsidiary headings in other stylings
    * An em quad inserted after the heading text to space the beginning leader dots
    * Leader dots at alternating 12pt and 3 pt spacings to get an .. .. .. effect
    * All leader dots vertically align
    * Leader dots end an en quad short of the longest possible page number, i.e. like a right tab
    * All leader dots have their own font/size/colour/etc. different from that of the heading text
    * All page numbers have their own font/size/colour/etc. different from that of the heading text OR leader dots


    This is a piece of cake in, say, FrameMaker but I suspect might be just a little beyond CSS as shown in those simple examples. In fact, you would have to build a TOC schema with quite a complex structure, enclose each of the items mentioned above in its own element (possibly a small subtree just for the leader dots), and apply CSS rules to each tiny fragment that makes up a complete TOC entry.

  • Still cubersome
    2005-04-02 09:47:51 nok4 [Reply]

    My opinion:
    For structured data ( those from database ):
    Let people first design a PDF template, and then fill real data into it and regenarate all pages by running a program. This way we can reduce tools people need to learn -- neither CSS, nor XSL-FO, just some advanced desingner like openoffice. Just reuse templates.


    For semi-structured data requiring many output formats (like docbook files):
    We still need XSL tranformation tools, CSS can only apply style to XML/HTML output. On character terminal device, We still need texinfo or plain text, CSS just can't help.


    One more problem is implementation differences on CSS specs, like IE vs. Opera vs. Mozilla ... Let alone CJK fonts support problem.
    XSL and XSL-FO are less suffer from such problem.


    WYSIWYG tools like PageMaker can easily produce nice PDF file, why not use such tool for producing template files or Adobe's FDF.

  • further discussion
    2005-02-04 12:21:45 stpeter1 [Reply]

    This is all intriguing, but it seems that there are still many details to be worked out, stories to be swapped, code to be compared, source to be viewed, and so on. Is there a good discussion list where people are hashing out the practical (not theoretical) issues involved in making XML+CSS a reality? This seems a bit off-topic for lists such as css-discuss...

  • why it's not better, specifically when doing TOCs
    2005-01-29 04:04:12 julian.reschke@gmx.de [Reply]

    Generating a Table of Contents as a matter of fact is a very good examle why CSS on it's own is not better.


    The example relies on the TOC being already present in the input file (in this case, the WebArch document's HTML version) -- thus, it doesn't generate it, but merely styles it. This may be acceptable for many (X)HTML documents and some custom XML formats that already contain a TOC, but in general when XML is used for document markup, people will expect that the TOC is automatically generated from the document itself (see, for instance, RFC2629's XML format).


    That being said, the test version of Prince is really useful for people who want to better support paged media in their CSS (I just fixed some aspects of rfc2629.xslt).


    Julian

    • why it's not better, specifically when doing TOCs
      2005-02-01 14:48:05 howcom [Reply]

      True, CSS cannot generate the TOC. (Some proposals for how to express this have been floating around, but none are supported by common tools.) So, if you want to automatically generate a TOC you must use another tool. For example, W3C has published a tool called "multitoc" as part of the html-xml-utils package. Or, you can use XSLT or Perl to generate a TOC. We believe in using different tools for different tasks. Making a TOC is one task and styling it is another. The paper tries to explain why CSS is a better style sheet language.

  • very simple case and unanswered questions
    2005-01-20 12:52:39 smckenzie [Reply]

    Your PDF output looks fine but there are some problems I didn't see addressed. First, just *how* do you get your output to pdf? I've seen some tools that will create pdf out of HTML that will support CSS, but they are rare and mostly broken. They also tend to be stand alone Windows apps and not scriptable.


    Second, how would you publish multiple XML documents to a single book? Is there a way to automaticly generate a TOC or an index? Is there a way to organize content based on alphabetical sorting? Most FO processors have extensions to automate PDF bookmarks, how would you handle that? Can you do callouts, keep a section title on the same page as the related content, format alternating margins for even and odd pages so there is extra space on the spine, etc?


    I think you might want to qualify the article that it is superior for printing a single web page to a printer from a browser, or maybe show some real typsetting features other than page size.

    • use your browser
      2005-01-29 10:43:43 alexkli [Reply]

      You can try to use your favorite CSS-enabled browser to print it and pass it to a pdf writer (Adobe Distiller or FreePDF or whatever).


      I don't know how it works for all kinds of XML files, but the Mozilla ones actually support print-CSS.

    • very simple case and unanswered questions
      2005-01-20 13:08:10 howcom [Reply]

      We have used Prince to generate the PDF document. You can download an alpha version of Prince which has the WebArch document included in it from www.yeslogic.com


      Prince reads a single file, but you can use some tricks (e.g. entities) to include other content. When Bert and I wrote the CSS book, we had one file per chapter and concatinated them into one before formatting.


      The size of the article limits the number of typesetting features we can describe -- what are you missing?

  • the alert reader
    2005-01-20 10:30:13 bryan rasmussen [Reply]

    'As the alert reader will already have inferred, the statement lists the heights of many different paper sizes. As such, it is interesting reading. However, we do not understand why this list belongs in a stylesheet. CSS provides a simple and elegant alternative by naming the different sizes in the specification rather than in each stylesheet.'


    from that statement the alert reader has to infer that either you know nothing about the abomination of xsl-fo, or you're being less than truthful. i agree that css styling is far superior to styling print than xsl-fo's have all the attributes in the element solution. but that's a bad design on xsl-fo's part, you don't seem to be willing to address that.

  • I'd prefer to use CSS but often the power of XSLT is required
    2005-01-20 10:06:09 dmini [Reply]

    Very interesting article, thank you. CSS syntax is indeed very concise and clear, but as the author has noted CSS' only purpose is formatting, and to me it means that input XML document has to be structured similar to desired output. XSLT, on the other hand can be used to format XML documents of arbitrary structure.
    If this limitation can be dismissed because input is in fact suitable for CSS formatting, I'd definitely choose CSS over XSLT, provided there are tools that can produce output in desired format (in many cases, PDF).

    • I'd prefer to use CSS but often the power of XSLT is required
      2005-01-20 11:08:24 BruceE [Reply]

      Regarding the power of using XSL to change the structure (that CSS can't address). Why wouldn't you then use XSL to change the structure then use CSS to transform for printing? Best of both worlds, orthogonal uses.


      The power that is described in this article with CSS is that a lot of the formatting issues with printing have been factored into the available constructs in CSS. This moves the complexity from the CSS sheet into the definition of the available CSS keywords (and thus into the redering engines). That is good because that is common to all (most) printing tasks.


      There is no reason why that couldn't have been done more with XML-FO (e.g. make page sizes predefined selections instead of actual sizes in length usings). Further, the problem appears to be that XML-FO is too low-level; an addition layer is perhaps needed. It would be analogous to adding the LaTeX macros to TeX. The the XSL for a document might be much smaller and not far off from the CSS examples.

  • Not enough tools
    2005-01-20 08:19:44 netwizard [Reply]

    My company uses XSLT every day to transform XML documents into PDF via XSL-FO. While I do like the approach provided by the authors via CSS, nevertheless there aren't any tools out there that can take advantage of it. I tried running the stylesheet in Opera and Firefox, and the results weren't too nice.


    Aside from Prince which is commercial, there aren't any other tools. My company uses Apache's Xalan and FOP tools which are free. We see no incentive to switch to something commercial. So my question is: are there any open source tools which do this?

    • Not enough tools
      2005-01-20 23:38:35 PeterRing [Reply]

      Some more tools:


      http://www.re.be/css2xslfo/


      "CSSToXSLFO is a utility which can convert an XML document, together with a CSS2 style sheet, into an XSLFO document, which can then be converted into PDF, PostScript, etc. with an XSLFO-processor. It has special support for the XHTML vocabulary, because that is the most obvious language it would be used for. The tool has a number of page-related extensions. It also comes with an API in the form of an XML filter."


      http://www.turnkey.com.au/tksweb/products/topleaf.html
      http://www.turnkey.com.au/tksweb/papers/gjnop2003.pdf


      "A great deal of work has gone into creating such standards as CSS, XSLT and XSL-FO, which allow human readable XML documents to be rendered on-screen, as hard copy or PDF. CSS is easy to work with using GUIs even non-experts can produce quite complex stylesheets but lacks key facilities for producing publishable quality pagination. XSLT/FO has virtually unlimited potential for the manipulation and display of source material. However the design of the required formatting objects, not to mention the XSLT transforms which create those objects, is well beyond the capabilities of non-specialists. The latest version of Turn-Key's TopLeaf rendering system is an attempt to provide the facilities of professional quality typesetting, but using a simple intuitive interface."

    • Not enough tools
      2005-01-20 13:14:43 howcom [Reply]

      It's true that browsers so far have focused screen use and don't print well -- with or without CSS. I see signs of a renewed interest in printing and expect the next generation of browsers to support more print-specific features. Prince and other similar products (which I'm sure will appear) will be used on the server side and for batch processing.