Menu

Controlling Whitespace, Part 1

November 7, 2001

Bob DuCharme

XML considers four characters to be whitespace: the carriage return, the linefeed, the tab, and the spacebar space. Microsoft operating systems put both a carriage return and a linefeed at the end of each line of a text file, and people usually refer to the combination as the "carriage return". XSLT stylesheet developers often get frustrated over the whitespace that shows up in their result documents -- sometimes there's more than they wanted, sometimes there's less, and sometimes it's in the wrong place. Over the next few columns, we'll discuss how XML and XSLT treat whitespace to gain a better understanding of what can happen, and we'll look at some techniques for controlling how an XSLT processor adds whitespace to the result document.

Before we start, however, it's important to remember two things if you get frustrated over a lack of control:

  • XSLT is an XML application that was originally designed to convert XML documents into XML documents.

  • XML applications often seem to take a cavalier attitude toward whitespace because the rules about the places in an XML document where whitespace doesn't matter sometimes give these applications free rein to add or remove whitespace in certain places.

The moral of the story is that when you're using XSLT to create XML documents, you shouldn't worry too much about whitespace. When using it to create text documents whose whitespace isn't coming out the way you want, remember that XSLT is a transformation language, not a formatting language, and some other tool may be necessary to give you the control you need. Extension functions may also provide relief; string manipulation is one of the most popular reasons for writing these functions. See the September column "XSLT Extensions" for more detail .

xsl:strip-space and xsl:preserve-space

The xsl:strip-space instruction lets you specify source tree elements that should have whitespace text nodes (that is, text nodes composed entirely of whitespace characters) stripped.

Let's look at how this element can affect the following sample source document.

<colors>

<color>red</color>

<color>    yellow    </color>

<color>
blue
</color>

<!-- 
  Next color element has whitespace content. 
-->
<color>     </color>

</colors>

To establish a baseline, this first stylesheet has no xsl:strip-space element. It's just an identity stylesheet that copies that source tree document to the result tree.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:template match="@*|node()">
   <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

The result looks just like the source:

<colors>

<color>red</color>

<color>    yellow    </color>

<color>
blue
</color>

<!-- 
  Next color element has whitespace content. 
-->
<color>     </color>

</colors>

Now we add an xsl:strip-space element to have the stylesheet strip whitespace text nodes from the color elements.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:strip-space elements="color"/>

  <xsl:template match="@*|node()">
   <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

When applied to the same source tree document, the result looks the same, except that the last color element is now an empty element. In the source tree, its only content was a text node of whitespace characters, and this node got stripped. While the yellow color element has plenty of whitespace, it's in a text node along with the string "yellow", so xsl:strip-space, which only affects nodes that are pure whitespace, leaves it alone.

<colors>

<color>red</color>

<color>    yellow    </color>

<color>
blue
</color>

<!-- 
  Next color element has whitespace content. 
-->
<color/>

</colors>

Now let's tell the XSLT processor to strip the whitespace nodes from the parent colors element instead of the color elements.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:strip-space elements="colors"/>

  <xsl:template match="@*|node()">
   <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

This has a more drastic effect, because the colors element had many more whitespace-only text nodes -- all those carriage returns between the color elements. The only carriage returns in the whole document that made it to the result document are the ones that were either inside a color element (before and after "blue") or inside of the comment.

<colors><color>red</color><color>    yellow    </color><color>
blue
</color><!-- 
  Next color element has whitespace content. 
--><color>     </color></colors>

You can list more than one element type name in the xsl:strip-space instruction's elements attribute, as long as their names are separated by whitespace. You can also use an asterisk as this attribute's value to tell the XSLT processor to strip whitespace text nodes from all the elements in the source tree.

    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

The xsl:preserve-space instruction does the opposite of the xsl:strip-space instruction: for all elements listed in its elements attribute, the XSLT processor will leave whitespace text nodes alone. By default, the XSLT processor treats all elements as xsl:preserve-space elements, so you only need it to override an xsl:strip-space instruction. For example, if your source document has twenty different element types and you want to strip whitespace nodes in all of them except the codeListing and sampleOutput elements, you don't have to list the other eighteen in an xsl:strip-space element's elements attribute. Instead, use an asterisk for the xsl:strip-space element's elements attribute value and list the two exceptions as the xsl:preserve-space element's elements attribute value.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

<xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:strip-space elements="*"/>
  <xsl:preserve-space elements="codeListing sampleOutput"/>

  <xsl:template match="@*|node()">
   <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
  </xsl:template>

</xsl:stylesheet>