XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Copying, Deleting, and Renaming Elements

June 07, 2000

Welcome to "Transforming XML." Each column will explain how to handle two or three basic document manipulation tasks using the W3C Standard that was spun off from the Extensible Stylesheet Language (XSL): the XSL Transformations Language, or XSLT. In this first column, we'll start with the basics -- the use of style sheets, the role of the xsl:stylesheet element, and how to copy, delete, and rename elements. (For other material on XSLT that's appeared in XML.com and elsewhere, see the XML.com Resource Guide.)

XSL Style Sheets

XSLT, according to the W3C Recommendation that specifies it, is "a language for transforming XML documents into other XML documents." As XML becomes more popular, and the dreams of shared DTDs often prove unrealistic, a quick and easy way to convert documents that conform to your DTD into documents that conform to my DTD becomes very valuable. This is especially so if you and I want to do business together without going to the trouble of authoring a DTD that we can both agree on.

An XSLT style sheet is an XML document that uses specialized element types from the http://www.w3.org/1999/XSL/Transform namespace to specify how to transform a set of elements. Technically, it's not transforming elements into elements, but a source tree into a result tree. This is good news, because by reading a document into a tree structure in memory before carrying out the style sheet's transformations, an XSLT processor can use information from anywhere in the tree when transforming a particular element (or rather, a particular tree node) because the whole document is sitting there in memory.

An XSLT processor is a program that applies an XSLT style sheet to a tree representation of an input document, and creates a result tree based upon the style sheet's instructions. Most processors read an XML document into the input tree first, and output the result tree as another document after finishing the transformation, with a net effect of converting one document into another.

Currently, the most popular implementations are James Clark's XT, the Apache XML Project's Xalan, and Michael Kay's SAXON. (A recent XSL-List posting from Clark about having no plans for further XT development is bound to hurt its long-term popularity.) Internet Explorer also implements some of XSLT, but its support of the W3C XSLT standard is still a bit idiosyncratic; see their XSL Developer's Guide for details. Check each of these XSLT processors' documentation for information on how to tell it to "use this XSL style sheet to turn this XML input document into this output document."

The document (root) element of an XSLT style sheet is usually an xsl:stylesheet element, but it doesn't have to be that exact element:

  • A style sheet can use xsl:transform as a synonym for xsl:stylesheet.

  • You don't have to use xsl as the namespace prefix to point to the namespace mentioned above, but it is a common convention.

  • There are ways to incorporate XSLT instructions directly into a document that doesn't use or refer to an xsl:stylesheet or xsl:transform element, but a serious transformation usually uses one of these in its own file.

XSLT offers various element types as potential children of this xsl:stylesheet element, each providing different style sheet instructions to the XSLT processor. The most important is xsl:template, which specifies a template rule.

Copying Elements to the Output

A template rule essentially says "when you find an input tree node that corresponds to the value of my match attribute, output text with the structure described by the template in my contents." The value of the match attribute can be a simple element type name, or a more complex pattern describing the element, attribute, comment, or processing instruction nodes that the template applies to.

Two popular XSLT elements to include in a template rule's contents are xsl:copy, which copies the current node, and xsl:apply-templates, which processes the children of the current node. For example, the single template in the following style sheet will copy the start-tags, end-tags, and contents of all title elements to the output. (Because of XSLT's default transformation rules, the contents of other elements will also be output without their tags.)

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="title">
  <xsl:copy>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Note that this template rule only acts on nodes representing title elements. Any attributes of title elements have their own nodes in the input tree and require their own template rule or rules if the XSLT processor is supposed to copy them to the output.

The xsl:copy-of element, on the other hand, can copy the entire subtree of each node that the template selects. This includes attributes, if the xsl:copy-of element's select attribute has the appropriate value. In the following example, the template copies title element nodes and all of their descendant nodes -- in other words, the complete title elements, including their tags, subelements, and attributes:

<xsl:template match="title">
  <xsl:copy-of select="*"/>
</xsl:template>

Deleting Elements

If a template rule says "output my contents when you find an input tree node that corresponds to the value of my match attribute," what happens if there is no content, as with the following two templates?

<xsl:template match="nickname">
</xsl:template>

<xsl:template match="project[@status='canceled']">
</xsl:template>

They'll output nothing, essentially deleting the matched nodes from the output. The first template rule says "when you find a nickname element, output nothing." The second takes advantage of the flexibility allowed in the patterns that are legal values for the template element's match attribute. While a match value of "project" would delete all the project elements from the output, the match value shown will only delete project elements whose status attributes have the string "canceled" as their value.

Changing Element Names

We saw above that xsl:apply-templates processes only the children of the current node. For an element, this means everything between the tags, but nothing in the tags themselves.

If your template outputs an input element's content but not its tags, you can surround that content with anything you want, as long as it doesn't prevent the output document from being well-formed. For example, the following template rule tells an XSLT processor to take any article element fed to it as input, and output its contents surrounded by html tags.

<xsl:template match="article">
  <html>
    <xsl:apply-templates/>
  </html>
</xsl:template>

The html tags add an actual html element to the style sheet, but because the tags have no xsl: prefix, the resulting html element is known in XSLT as a "literal result element." The element isn't some special XSLT instruction, so an XSLT processor will leave it alone and pass its tags along to the output looking just like they do in the style sheet.

Instead of enclosing the article template rule's xsl:apply-templates element with html tags, another way to convert article elements to html elements would be to enclose the xsl:apply-templates element with an xsl:element element that had "html" specified as the value for its name attribute. In this particular case, that would have been overkill -- the markup shown above is much simpler and gets the job done -- but the xsl:element element's ability to provide the element type name in an attribute value lets you use expressions that are more complex than a simple string like "html" as that element type name. This makes it possible to dynamically create the element name by concatenating strings, calling functions, or by retrieving element content or attribute values from elsewhere in the document to use in the element name. We'll learn more about these tricks in future "Transforming XML" columns.