XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

The Extensible Style Language - XSL

January 19, 1999

Styling XML Documents

Reprinted from Web Techniques
January 1999

From the earliest days of the Web, we've been using essentially the same set of tags in our documents. Web pages written in HTML use HTML tags and the meaning of those tags is well understood: <H1> makes a heading, <IMG> loads a graphic, <OL> starts an ordered list, and so on. The number of tags has slowly grown, and there have been numerous browser-compatibility issues, but the basic tag set is still the same.

There's a significant benefit to a fixed tag set with fixed semantics: portability. A Web page that uses the standard tags can be viewed by just about any browser, anywhere in the world. However, HTML is very confining; Web designers want more control over presentation and many processes would benefit from more descriptive tagging.

Enter XML. With XML, we can use any tags we want. We can write documents using our own tag names–names that are meaningful in the context of our subject matter and offer the possibility of far greater control over presentation. But this freedom comes at a price: XML tag names have no predefined semantics. An <H1> might just as legitimately identify a tall hedge as a first-level heading. Is <IMG> an image, or an imaginary number? Who knows?

The style sheet knows. From the very beginning of the XML effort, it was recognized that in order to successfully send XML documents over the Web, it would be necessary to have a standard mechanism for describing how they were to be presented. That's why we need style sheets.

The Extensible Style Language (XSL) is the style language for XML. At the time of this writing (October 1998), XSL is under active development by the W3C. On August 18, 1998, the XSL Working Group (WG) released its first Working Draft. This article introduces XSL as described in that document. (Visit www.w3.org/TR/WD-xsl to view the Working Draft for yourself.)

By the time this article is published, a second Working Draft may be available. It doesn't seem likely that any of the topics covered here will change substantially between the first and second Working Drafts, but it's always possible.

What Does a Style Sheet Do?

In simplest terms, a style sheet contains instructions that tell a processor (such as a Web browser, print composition engine, or document reader) how to translate the logical structure of a source document into a presentational structure.

Style sheets typically contain instructions like these:

  • Display hypertext links in blue.
  • Start chapters on a new, left-hand page.
  • Number figures sequentially throughout the document.
  • Speak emphasized text in a slightly louder voice.

Many style-sheet languages augment the presentation of elements that have a built-in semantic meaning. For example, a Microsoft Word paragraph style can change the presentation of a paragraph, but even without the style, Word knows that the object in question is a paragraph.

The challenge for XSL is slightly greater. Because there's no underlying semantic to augment for XML, XSL must specify how each element should be presented and what the element is. For this reason, XSL defines not only a language for expressing style sheets, but also a vocabulary of "formatting objects" that have the necessary base semantics.

For the purpose of this article, we're going to consider a simple XML document, shown in Example 1:

Example 1: A simple XML document.

<?xml version='1.0'?>
<doc><title>My Document</title>
<para>This is a <em>short</em> document.</para>
<para>It only exists to <em>demonstrate a <em>simple</em>
XML document</em>.</para>
<figure><title>My Figure</title>
<graphic fileref="myfig.gif"/>
</figure>
</doc>

This document contains only a few elements:

  • doc defines document element;
  • title defines titles;
  • para defines paragraphs;
  • em indicates emphasis;
  • figure and graphic define external graphics.

How Does XSL Work?

Before discussing XSL in more detail, it's necessary to consider the XSL processing model. An XSL processor begins with a style sheet and a "source tree." The source tree is the tree representation of the parsed XML source document. All XML documents can be represented as trees.

Conceptually, the XSL processor begins at the root node in the source tree and processes it by finding the template in the style sheet that describes how that element should be displayed. Each node is then processed in turn until there are no more nodes left to be processed. (In fact, it's a little more complicated than this because each template can specify which nodes to process, so some nodes may be processed more than once and some may not be processed at all. We'll examine this later.)

The product of all this processing is a "result tree." If the result tree is composed of XSL formatting objects, then it describes how to present the source document. It's a feature of XSL that the result tree doesn't have to be composed of XSL formatting objects–it can be composed of any elements. One common alternative to XSL formatting objects will be HTML element names. When HTML is used in the result tree, XSL will transform an XML source document into an XML document that looks very much like HTML. It's important to realize, however, that the result is XML, not HTML. In particular, empty elements will use the XML empty-element syntax, and it's impossible to produce documents that are not well-formed XML.

What Does XSL Look Like?

XSL style sheets are XML documents. A short XSL style sheet can be seen in Example 2. This style sheet transforms source documents like the XML document in Example 1 into HTML. A style sheet is contained within a style sheet element and contains template elements. (Style sheets can contain a small handful of elements in addition to the template, but most style sheets consist of mostly templates.)

Example 2: A simple XSL style sheet that generates HTML from XML.

<xsl:stylesheet
 xmlns:xsl="http://www.w4.org/TR/WD-xsl">
 
<xsl:template pattern="doc">
	<HTML>
	<HEAD>
	  <TITLE>A Document</TITLE>
	</HEAD>
	<BODY>
		<xsl:process-children/>
	</BODY>
	</HTML>
</xsl:template>

<xsl:template pattern="title">
	<H1>
		<xsl:process-children/>
	</H1>
</xsl:template>

<!-- this stylesheet handles only a 
	subset of the sample document -->
	
</xsl:stylesheet>

Don't worry if this looks a little confusing at first. There's a lot going on. We'll revisit this style sheet in the "Understanding XSL" section.

One thing that stands out in an XSL style sheet is the use of namespaces. (covered in two articles in this issue of XML.com), namespaces are what all the colon-delimited prefixes are about.

In XSL, there can be no reserved element names, so it's necessary to use some other mechanism to distinguish between elements that have XSL semantics and other elements. This is the problem that namespaces were designed to solve.

If you're not familiar with namespaces, here are some simple guidelines:

The prefix is significant when comparing element names; therefore xsl:template and template are different.

The prefix string is arbitrary. What's important is the association of a prefix string with a URI. That's the function of the "xmlns:" attribute on the stylesheet.

The attribute

xmlns:xsl="http:// www.w3.org/TR/WD-xsl"
associates the namespace prefix "xsl" with the URI that follows it:
("http://www.w3.org/TR/ WD-xsl").
If it were instead
xmlns:xyzzy="http://www.w3.org/ TR/WD-xsl"
then the prefix xyzzy: would replace every instance of xsl: in the example, and the style sheet would be exactly the same.

From the preceding points, it follows that xsl:template and xyz:template are different (unless the two namespace prefixes are associated with the same URI).

Pages: 1, 2, 3

Next Pagearrow