Five XSLT Basics
Michael Fitzgerald is the author of Learning XSLT.
I know what you're up against. You've just inherited a new project at work that requires you to learn XSLT, but you don't have a clue where to start. If that's your problem, this article should give you a leg up over the wall. It will quickly cover five basics of XSLT found in the first chapter of Learning XSLT, O'Reilly's new hands-on guide to get you using XSLT with XPath by close of business today.
XSLT Basic #1: What Is XSLT?
Extensible Stylesheet Language Transformations or XSLT is a language that allows you to transform XML documents into XML, HTML, XHTML, or plain text documents. It relies on a companion technology called XPath. XPath helps XSLT identify and find nodes in XML documents; nodes are things like elements, attributes, and other objects in XML. With XSLT and XPath, you can do things like transform an XML document into HTML or XHTML so it will easily display in a web browser; convert from one XML markup vocabulary to another, such as from Docbook to XHTML (see www.docbook.org); extract plain text out of an XML document for use in some other application, like a text editor; or build new Spanish language document by pulling and repurposing all the Spanish text from a multilingual XML document. This is only a start of what you can do with XSLT. Now that you know what it is, it's time to learn how it works.
XSLT Basic #2: How Does XSLT Work?
The quickest way to get you acquainted with how XSLT works is through a simple example. Consider this ridiculously brief XML document contained in a file I'll call msg.xml:
There isn't much to this document, but it's legal, well-formed XML: just a single, empty element tag with no content (that is, nothing between a pair of tags). For our purposes, it's the source document for the XSLT processing we'll do in a minute. Now you can use the very simple XSLT stylesheet msg.xsl to transform msg.xml:
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="text"/> <template match="msg">Found it!</template> </stylesheet>
You'll notice that XSLT is written in XML. This allows you to use some of the same tools to process XSLT stylesheets that you would use to process other XML documents. Nice.
The first element (start tag, really) in msg.xsl is
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
This is the document element for stylesheet, one of two possible document elements
in XSLT. The other possible document element is
transform, which is actually
just a synonym for
stylesheet. You can use one or the other. The
stylesheet is required, along with its value of 1.0. (We're only dealing with
version 1.0 of XSLT here.)
stylesheet is a special attribute for declaring a namespace.
It's value is http://www.w3.org/1999/XSL/Transform, which is the official namespace for
XSLT. An XSLT stylesheet must always have such a namespace declaration in order for
it to work. (XSLT stylesheets usually use the
xsl prefix, as in
I am setting the prefix aside for simplicity at the moment. You'll want to use
your stylesheets get only slightly more complex.)
stylesheet element is followed by the
output element which is optional. The
text for the
method attribute signals that you want the output of the stylesheet to
just be plain text:
Two other possible values for
method in XSLT 1.0 are
element actually has ten attributes, all of which are optional.)
The next element in msg.xsl is the
template element. This element is at the heart of what
XSLT does. A template rule consists of two parts: a pattern, such as an XML element in
the source document that you're trying to match, and a sequence of instructions. The
match attribute of
template contains a pattern, a location path in XPath. The pattern
in this example is the name of the
<template match="msg">Found it!</template>
XPath syntax always appears in attribute values, as in the value of
match. The sequence
of instructions (sometimes called a sequence constructor) contains only the literal text
Found it!. Sequence instructions tells an XSLT processor what you want to have
happen when the pattern is found in the source. Using this stylesheet, when
msg is found
in the source by an XSLT processor, it will output the text
Found it!. When a template
executes its instructions, that template is said to be instantiated. To make this happen, you
need an XSLT processor.
Basic #3: How Do I Get XSLT to Work?
An XSLT processor processes a source document with an XSLT stylesheet, producing an output or result. There are lots of free XSLT processors available for download on the web. I'll mention a couple.
Michael Kay's free Instant Saxon (saxon.exe) runs on the Windows command line. Download it from prdownloads.sourceforge.net/saxon/instant_saxon6_5_3.zip. (If the link fails, just try saxon.sourceforge.net). Unzip the file in some directory on your Windows box. Assuming that you have created and saved the files msg.xml and msg.xsl discussed earlier in the same spot that you unzipped saxon.exe, you can run Instant Saxon from the Windows command line like this:
saxon msg.xml msg.xsl
This command will process msg.xml against the stylesheet msg.xsl and produce the simple result:
If you prefer a graphical application, Architag offers a free, graphical XML editor with XSLT processing capability called xRay2. It is available for download from www.architag.com/xray. Like Instant Saxon, xRay2 runs only on the Windows platform. Assuming that you have successfully downloaded and installed it, launch xRay2 and open the file msg.xml and then open the file msg.xsl. Now select New XSLT Transform from the File menu. In the XML Document pull-down menu, select msg.xml, and in the XSLT Program pull-down menu, select msg.xsl (if it is not already checked, check Auto-update). The result of the transformation should appear in the transform window of the application.
If you are using the Linux operating system or some other Unix flavor, you can run Apache's XSLT processor Xalan C++ (works on Windows, too). In order to run Xalan, you also need the C++ version of Xerces, Apache's XML parser. You can find both Xalan C++ and Xerces C++ on xml.apache.org. After downloading and installing them (follow instructions on the Apache site), you need to make sure that Xalan and Xerces are in your execution path. Now type the following line in a Unix shell window or at a Windows command prompt:
xalan msg.xml msg.xsl
If successful, the following results should be printed on your screen:
Basic #4: How Do I Get XSLT to Work in a Browser?
An XSLT processor is probably readily available to you on your computer desktop in the form of a web browser: Microsoft Internet Explorer (IE) Version 6, Netscape Navigator (Netscape) Version 7.1, Mozilla Version 1.4, or Mozilla Firebird 0.7. Each of these browsers has client-side XSLT processing ability already built into them.
The way to apply an XSLT stylesheet like msg.xsl to the document msg.xml in a browser is by using a processing instruction. A processing instruction (PI) allows you to include instructions for an application in an XML document.
You can see a processing instruction in a slightly altered version of msg.xml, which I call msg-pi.xml:
<?xml-stylesheet href="msg.xsl" type="text/xsl"?> <msg/>
The XML stylesheet PI should always come before the first element in the document (the
msg in msg-pi.xml). The purpose of this PI is similar to one of the
purposes of the
link tag in HTML, that is, to associate a stylesheet with the document.
Save msg-pi.xml in a text file with the other files. If you open msg-pi.xml in one of the
browsers I mentioned, the built-in XSLT processor in the browser will write the string
Found it! on the browser's canvas or rendering space.
XSLT Basic #5: Beware the Built-in Templates
XSLT has a hobgoblin of sorts. It's a feature know as built-in templates. Built-in
templates automatically find nodes that are not specifically matched by a
template rule, so you can sometimes get results from an XSLT stylesheet that you're not
expecting. These built-in templates automatically find text (among other things) in the
XML source when no explicit template matches that text. This can rattle your nerves at
first, but you'll get comfortable with them soon enough. I'll illustrate an instance where
the built-in template matches text in an XML document. The file hobgoblin.xml contains
a bit of text in the element
To trigger the built-in template for text, the dull-witted stylesheet hobgoblin.xsl will do the trick:
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <output method="text"/> </stylesheet>
Apply hobgoblin.xsl to hobgoblin.xml with Instant Saxon using this command:
saxon hobgoblin.xml hobgoblin.xsl
And you will get the following result:
Even though hobgoblin.xsl does not contain a template rule, Instant Saxon found the text
Spooky! in the
msg element by default using a built-in template rule.
That covers five basics of XSLT 1.0. This article is only a starting point to get you rolling. There is much, much more to learn about XSLT. Of course, Learning XSLT can help you out there. For resources and news for XSLT from the W3C, go to www.w3.org/Style. If you're brave enough to read the specs, go to www.w3.org/TR/xslt/ and www.w3.org/TR/xpath/ to learn more about XSLT 1.0 and XPath 1.0. (Versions 2.0 of these specs are in the last stages of development and are found at www.w3.org/TR/xslt20/ and www.w3.org/TR/xpath02/.) You can search the archives of XSL-List (an XSLT mail list hosted by Mulberry Technologies, Inc.) at www.biglist.com/lists/xsl-list/archives/ or join the list at www.mulberrytech.com/xsl/xsl-list/index.html#subscribing. Wherever you go with XSLT, or wherever it takes you, best of luck.
O'Reilly & Associates recently released (November 2003) Learning XSLT.
Sample Chapter 2, Building New Documents with XSLT , is available free online.
For more information, or to order the book, click here.