XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Five XSLT Basics

November 26, 2003

Michael Fitzgerald is the author of Learning XSLT.

I know what you're up against. You've just inherited a new project at work that requires you to learn XSLT, but you don't have a clue where to start. If that's your problem, this article should give you a leg up over the wall. It will quickly cover five basics of XSLT found in the first chapter of Learning XSLT, O'Reilly's new hands-on guide to get you using XSLT with XPath by close of business today.

XSLT Basic #1: What Is XSLT?

Extensible Stylesheet Language Transformations or XSLT is a language that allows you to transform XML documents into XML, HTML, XHTML, or plain text documents. It relies on a companion technology called XPath. XPath helps XSLT identify and find nodes in XML documents; nodes are things like elements, attributes, and other objects in XML. With XSLT and XPath, you can do things like transform an XML document into HTML or XHTML so it will easily display in a web browser; convert from one XML markup vocabulary to another, such as from Docbook to XHTML (see www.docbook.org); extract plain text out of an XML document for use in some other application, like a text editor; or build new Spanish language document by pulling and repurposing all the Spanish text from a multilingual XML document. This is only a start of what you can do with XSLT. Now that you know what it is, it's time to learn how it works.

XSLT Basic #2: How Does XSLT Work?

The quickest way to get you acquainted with how XSLT works is through a simple example. Consider this ridiculously brief XML document contained in a file I'll call msg.xml:

<msg/>

There isn't much to this document, but it's legal, well-formed XML: just a single, empty element tag with no content (that is, nothing between a pair of tags). For our purposes, it's the source document for the XSLT processing we'll do in a minute. Now you can use the very simple XSLT stylesheet msg.xsl to transform msg.xml:

<stylesheet version="1.0" 
xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="text"/>

 <template match="msg">Found it!</template>

</stylesheet>

You'll notice that XSLT is written in XML. This allows you to use some of the same tools to process XSLT stylesheets that you would use to process other XML documents. Nice.

The first element (start tag, really) in msg.xsl is

<stylesheet version="1.0"
xmlns="http://www.w3.org/1999/XSL/Transform">

This is the document element for stylesheet, one of two possible document elements in XSLT. The other possible document element is transform, which is actually just a synonym for stylesheet. You can use one or the other. The version attribute in stylesheet is required, along with its value of 1.0. (We're only dealing with version 1.0 of XSLT here.)

The attribute xmlns on stylesheet is a special attribute for declaring a namespace. It's value is http://www.w3.org/1999/XSL/Transform, which is the official namespace for XSLT. An XSLT stylesheet must always have such a namespace declaration in order for it to work. (XSLT stylesheets usually use the xsl prefix, as in xsl:stylesheet, but I am setting the prefix aside for simplicity at the moment. You'll want to use xsl when your stylesheets get only slightly more complex.)

The stylesheet element is followed by the output element which is optional. The value text for the method attribute signals that you want the output of the stylesheet to just be plain text:

<output method="text"/>

Two other possible values for method in XSLT 1.0 are xml and html. (The output element actually has ten attributes, all of which are optional.)

The next element in msg.xsl is the template element. This element is at the heart of what XSLT does. A template rule consists of two parts: a pattern, such as an XML element in the source document that you're trying to match, and a sequence of instructions. The match attribute of template contains a pattern, a location path in XPath. The pattern in this example is the name of the msg element:

<template match="msg">Found it!</template>

XPath syntax always appears in attribute values, as in the value of match. The sequence of instructions (sometimes called a sequence constructor) contains only the literal text Found it!. Sequence instructions tells an XSLT processor what you want to have happen when the pattern is found in the source. Using this stylesheet, when msg is found in the source by an XSLT processor, it will output the text Found it!. When a template executes its instructions, that template is said to be instantiated. To make this happen, you need an XSLT processor.

Basic #3: How Do I Get XSLT to Work?

An XSLT processor processes a source document with an XSLT stylesheet, producing an output or result. There are lots of free XSLT processors available for download on the web. I'll mention a couple.

Instant Saxon

Michael Kay's free Instant Saxon (saxon.exe) runs on the Windows command line. Download it from prdownloads.sourceforge.net/saxon/instant_saxon6_5_3.zip. (If the link fails, just try saxon.sourceforge.net). Unzip the file in some directory on your Windows box. Assuming that you have created and saved the files msg.xml and msg.xsl discussed earlier in the same spot that you unzipped saxon.exe, you can run Instant Saxon from the Windows command line like this:

saxon msg.xml msg.xsl

This command will process msg.xml against the stylesheet msg.xsl and produce the simple result:

Found it!

xRay2

If you prefer a graphical application, Architag offers a free, graphical XML editor with XSLT processing capability called xRay2. It is available for download from www.architag.com/xray. Like Instant Saxon, xRay2 runs only on the Windows platform. Assuming that you have successfully downloaded and installed it, launch xRay2 and open the file msg.xml and then open the file msg.xsl. Now select New XSLT Transform from the File menu. In the XML Document pull-down menu, select msg.xml, and in the XSLT Program pull-down menu, select msg.xsl (if it is not already checked, check Auto-update). The result of the transformation should appear in the transform window of the application.

Xalan

If you are using the Linux operating system or some other Unix flavor, you can run Apache's XSLT processor Xalan C++ (works on Windows, too). In order to run Xalan, you also need the C++ version of Xerces, Apache's XML parser. You can find both Xalan C++ and Xerces C++ on xml.apache.org. After downloading and installing them (follow instructions on the Apache site), you need to make sure that Xalan and Xerces are in your execution path. Now type the following line in a Unix shell window or at a Windows command prompt:

xalan msg.xml msg.xsl

If successful, the following results should be printed on your screen:

Found it!

Basic #4: How Do I Get XSLT to Work in a Browser?

An XSLT processor is probably readily available to you on your computer desktop in the form of a web browser: Microsoft Internet Explorer (IE) Version 6, Netscape Navigator (Netscape) Version 7.1, Mozilla Version 1.4, or Mozilla Firebird 0.7. Each of these browsers has client-side XSLT processing ability already built into them.

The way to apply an XSLT stylesheet like msg.xsl to the document msg.xml in a browser is by using a processing instruction. A processing instruction (PI) allows you to include instructions for an application in an XML document.

You can see a processing instruction in a slightly altered version of msg.xml, which I call msg-pi.xml:

<?xml-stylesheet href="msg.xsl" type="text/xsl"?>
<msg/>

The XML stylesheet PI should always come before the first element in the document (the document element msg in msg-pi.xml). The purpose of this PI is similar to one of the purposes of the link tag in HTML, that is, to associate a stylesheet with the document. Save msg-pi.xml in a text file with the other files. If you open msg-pi.xml in one of the browsers I mentioned, the built-in XSLT processor in the browser will write the string Found it! on the browser's canvas or rendering space.

XSLT Basic #5: Beware the Built-in Templates

XSLT has a hobgoblin of sorts. It's a feature know as built-in templates. Built-in templates automatically find nodes that are not specifically matched by a template rule, so you can sometimes get results from an XSLT stylesheet that you're not expecting. These built-in templates automatically find text (among other things) in the XML source when no explicit template matches that text. This can rattle your nerves at first, but you'll get comfortable with them soon enough. I'll illustrate an instance where the built-in template matches text in an XML document. The file hobgoblin.xml contains a bit of text in the element msg:

<msg>Spooky!</msg>

To trigger the built-in template for text, the dull-witted stylesheet hobgoblin.xsl will do the trick:

<stylesheet version="1.0" 
xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="text"/>
</stylesheet>

Apply hobgoblin.xsl to hobgoblin.xml with Instant Saxon using this command:

saxon hobgoblin.xml hobgoblin.xsl

And you will get the following result:

Spooky!

Even though hobgoblin.xsl does not contain a template rule, Instant Saxon found the text Spooky! in the msg element by default using a built-in template rule.

Summary

That covers five basics of XSLT 1.0. This article is only a starting point to get you rolling. There is much, much more to learn about XSLT. Of course, Learning XSLT can help you out there. For resources and news for XSLT from the W3C, go to www.w3.org/Style. If you're brave enough to read the specs, go to www.w3.org/TR/xslt/ and www.w3.org/TR/xpath/ to learn more about XSLT 1.0 and XPath 1.0. (Versions 2.0 of these specs are in the last stages of development and are found at www.w3.org/TR/xslt20/ and www.w3.org/TR/xpath02/.) You can search the archives of XSL-List (an XSLT mail list hosted by Mulberry Technologies, Inc.) at www.biglist.com/lists/xsl-list/archives/ or join the list at www.mulberrytech.com/xsl/xsl-list/index.html#subscribing. Wherever you go with XSLT, or wherever it takes you, best of luck.


O'Reilly & Associates recently released (November 2003) Learning XSLT.