XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XML considers four characters to be whitespace: the carriage return, the linefeed, the tab, and the spacebar space. Microsoft operating systems put both a carriage return and a linefeed at the end of each line of a text file, and people usually refer to the combination as the "carriage return". XSLT stylesheet developers often get frustrated over the whitespace that shows up in their result documents -- sometimes there's more than they wanted, sometimes there's less, and sometimes it's in the wrong place. Over the next few columns, we'll discuss how XML and XSLT treat whitespace to gain a better understanding of what can happen, and we'll look at some techniques for controlling how an XSLT processor adds whitespace to the result document.

Before we start, however, it's important to remember two things if you get frustrated over a lack of control:

  • XSLT is an XML application that was originally designed to convert XML documents into XML documents.

  • XML applications often seem to take a cavalier attitude toward whitespace because the rules about the places in an XML document where whitespace doesn't matter sometimes give these applications free rein to add or remove whitespace in certain places.

Comment on this article Have you had problems with whitespace in XSLT transforms? Share your experience in the forum.
Post your comments

The moral of the story is that when you're using XSLT to create XML documents, you shouldn't worry too much about whitespace. When using it to create text documents whose whitespace isn't coming out the way you want, remember that XSLT is a transformation language, not a formatting language, and some other tool may be necessary to give you the control you need. Extension functions may also provide relief; string manipulation is one of the most popular reasons for writing these functions. See the September column "XSLT Extensions" for more detail .

xsl:strip-space and xsl:preserve-space

The xsl:strip-space instruction lets you specify source tree elements that should have whitespace text nodes (that is, text nodes composed entirely of whitespace characters) stripped.

Let's look at how this element can affect the following sample source document.

<colors>

<color>red</color>

<color>    yellow    </color>

<color>
blue
</color>

<!-- 
  Next color element has whitespace content. 
-->
<color>     </color>

</colors>

To establish a baseline, this first stylesheet has no xsl:strip-space element. It's just an identity stylesheet that copies that source tree document to the result tree.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:template match="@*|node()">
   <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

The result looks just like the source:

<colors>

<color>red</color>

<color>    yellow    </color>

<color>
blue
</color>

<!-- 
  Next color element has whitespace content. 
-->
<color>     </color>

</colors>

Now we add an xsl:strip-space element to have the stylesheet strip whitespace text nodes from the color elements.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:strip-space elements="color"/>

  <xsl:template match="@*|node()">
   <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

When applied to the same source tree document, the result looks the same, except that the last color element is now an empty element. In the source tree, its only content was a text node of whitespace characters, and this node got stripped. While the yellow color element has plenty of whitespace, it's in a text node along with the string "yellow", so xsl:strip-space, which only affects nodes that are pure whitespace, leaves it alone.

<colors>

<color>red</color>

<color>    yellow    </color>

<color>
blue
</color>

<!-- 
  Next color element has whitespace content. 
-->
<color/>

</colors>

Now let's tell the XSLT processor to strip the whitespace nodes from the parent colors element instead of the color elements.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:strip-space elements="colors"/>

  <xsl:template match="@*|node()">
   <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

This has a more drastic effect, because the colors element had many more whitespace-only text nodes -- all those carriage returns between the color elements. The only carriage returns in the whole document that made it to the result document are the ones that were either inside a color element (before and after "blue") or inside of the comment.

<colors><color>red</color><color>    yellow    </color><color>
blue
</color><!-- 
  Next color element has whitespace content. 
--><color>     </color></colors>

You can list more than one element type name in the xsl:strip-space instruction's elements attribute, as long as their names are separated by whitespace. You can also use an asterisk as this attribute's value to tell the XSLT processor to strip whitespace text nodes from all the elements in the source tree.

    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

The xsl:preserve-space instruction does the opposite of the xsl:strip-space instruction: for all elements listed in its elements attribute, the XSLT processor will leave whitespace text nodes alone. By default, the XSLT processor treats all elements as xsl:preserve-space elements, so you only need it to override an xsl:strip-space instruction. For example, if your source document has twenty different element types and you want to strip whitespace nodes in all of them except the codeListing and sampleOutput elements, you don't have to list the other eighteen in an xsl:strip-space element's elements attribute. Instead, use an asterisk for the xsl:strip-space element's elements attribute value and list the two exceptions as the xsl:preserve-space element's elements attribute value.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

<xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:strip-space elements="*"/>
  <xsl:preserve-space elements="codeListing sampleOutput"/>

  <xsl:template match="@*|node()">
   <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
  </xsl:template>

</xsl:stylesheet>


1 to 13 of 13
  1. Unintended demonstration of the need?
    2009-04-22 02:58:58 
    My browser renders author Bob


    Ducharme's


    name as shown above (with the "Bob" two lines up from and to the right of "Ducharme").


    Was this intentional (a graphic demonstration of what happens when whitespace isn't resolved as intended in a cross-platform app), or just serendipity?

  2. please suggest a solution...problem with white space in node
    2007-08-08 19:08:13 
    input XML




    ACDAF89127349812
    itemSWHFC1982743982


    Honda motors
    Honda


    13123123-14381283
    123213123-124758789


    honda.com.au
    honda.com


    com.au
    com


    eth0
    eth1


    eth1
    eth2


    192.168.3.3
    192.123.3.3


    12.12.12.12
    12.12.12.11


    12.12.12.10
    12.12.12.13


    12.12.12.9
    12.12.12.14


    12.12.12.8
    12.12.12.15


    itemC
    item


    12.12.12.255
    12.12.12.254


    12.12.12.2
    12.12.12.122


    12.12.12.1
    12.12.12.19


    COMPLETE
    INPROC


    honda.com.au
    honda.com



    G_ERd2342kf



    russian



    Andy



    Titeren



    Titeren



    1





    ----------------------------------------------
    XSLT using








    <

    >

    </

    >



    <

    >


    </

    >




    ___________________________________________



    i am getting the output XML as




    ACDAF89127349812
    Honda motors
    13123123-14381283
    honda.com.au
    com.au
    eth0
    eth1
    192.168.3.3
    12.12.12.12
    12.12.12.10
    12.12.12.9
    12.12.12.8
    itemC
    12.12.12.255
    12.12.12.2
    12.12.12.1
    COMPLETE
    honda.com.au



    G_ERd2342kf
    russian Andy
    Titeren
    Titeren
    1




    -------------------------------------------------
    you can see the xml output is not a valid one as there is space inside the same node


    like





    and so on
    but as in xml space not allowed inside nodes
    it should me






    how to change the XSLT to get the output like above.


    i tried to solve the issue with
    or


    thanks in advance

    • please suggest a solution...problem with white space in node
      2007-08-12 10:58:42 Bob DuCharme
      (For questions like this, the XSL-list at http://www.mulberrytech.com/xsl/xsl-list/ is generally the place where XSLT questions will get the quickest answers.) 


      You're trying to create elements names from content, and space isn't allowed in element names. The translate function (see http://www.w3.org/TR/1999/REC-xpath-19991116#function-translate) will help you take them out; the following stylesheet (which can use any stylesheet as input) demonstrates how:


      version="1.0">


      this is a test






      Just about any use of disable-output-escaping is a kludge. You'd be better off creating your elements with the xsl:element element. Then you could have something like
      Bob


  3. Insert a
    in HTML

    2007-01-29 04:32:38 
    Hi, I'm new to XSLT and I'm trying to transform an XML into HTML using XSL.
    The problem is I want to preserve my linebreaks as typed in the XML, as the previous comment in this thread.


    Here's a sample of the stylesheet:












    ]>






    Untitled Document










    And this is the XML



    Lorem ipsum dolor sit amet, consectetuer adipiscing elit.


    Morbi vestibulum, magna vel rutrum malesuada, nunc odio cursus nibh.



    The result is:


    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Morbi vestibulum, magna vel rutrum malesuada, nunc odio cursus nibh.



    I've tryed many ways, including your examples, but it doesn't work. Any ideas?


    Thanks in advance.

    • Insert a
      in HTML

      2007-01-29 08:22:00 Bob DuCharme
      I can't think of anything to tell you outside of what's in the article. (I won't have a chance to try your example for a while.)


      Bob

  4. Preserve linefeed
    2006-05-22 00:47:43 
    Hello,
    I have a requirement where I transform a XML document into another and then store it to the Database. To handle extraction/insertion of nodes, I use DOM before and after transformation.
    I need to preserve the linefeeds as they were. So "\r\n" should be "\r\n" in the output document and so on. But right now, my linefeed gets converted to "\n" while creating a String back from the DOM Document. I have tried using XMLSerializer and JDOM. Any input will be appreciated.
    Thanks in advance
    • Preserve linefeed
      2006-05-22 21:02:14 Bob DuCharme
      I have very little experience with the DOM, but I do know that XML parsers don't care about the difference between \r\n and \n, and trying to get them to care is like trying to get them to care about attribute order. There are good reasons that they don't care (see the XML Recommendation), and to control things like this you need to use something like a perl script that treats the text as a string of bytes and not as XML.


      Bob

      • Preserve linefeed
        2008-06-16 05:58:01 
        The XML parser treats newline sequence in a unique way (replacing newline by a single linefeed). This is to ensure the cross-platform compatibility e.g. in Windows linefeed is represented by linefeed + carriage while in Unix it is represented by linefeed character. 
        
  5. Need to preserve carriage return and line feeds
    2005-08-01 10:12:33 
    I am trying to import an XML document in SQL server using OPENXML. I need to retain my carriage returns and line feeds but they seem to get lost. I have tried using xml:space="preserve" but it didn't work. Any has any clue what needs to be done?


    Thanks.


    • Need to preserve carriage return and line feeds
      2005-08-01 12:04:14 Bob DuCharme
      I would try seeking out some people more familiar with SQL Server and/or OpenXML (e.g. a Microsoft mailing list or discussion forum) because it sounds like an implementation issue.


      Bob

  6. Need to keep my carriage returns
    2005-05-19 05:38:39 
    I am trying to maintain my carriage returns but they are always getting lost, i've tried using xml:space="preserve" and xslpreserve-space... but neither work or i am using them wrong.


    The data is inside a tag as follows:

    line1
    line2
    line3


    I want to keep the formatting of the data (the 3 lines) as it is but it goes like this:
    line1 li
    ne2 line3


    I am using an xsl stylesheet but i beleive the carriage returns are removed by the time that becomes involved.


    The information between the tags is written by a unix script and so i cannot simply add
    tags or change anything between the tags.


    Please help,


    Ben.

    • Need to keep my carriage returns
      2005-05-19 06:34:10 Bob DuCharme
      I would have to see the whole stylesheet. That new line break in the middle of "line2" is particularly odd, so there could be some culprit outside of your XSLT processor responsible. 


      I would post the smallest possible complete stylesheet that reproduces the problem, along with a sample source file and the details about what XSLT processor you're using (and for that matter, where the data comes from and where it goes to before you see it, in case there's a chance that some other program is causing the problem) to the XSL-List mentioned below.


      Bob

  7. XML formatting
    2005-01-11 08:22:44 
    I have an application that writes to an a xml file like this:


    thursday


    as you can see, everything is on one line,
    however when i read it from the xml, i need to parse through it as if it was like this:




    thursday


    is there some way i can do this?


    thanks in advance

    • XML formatting
      2005-01-11 08:51:22 Bob DuCharme
      It sounds like your second application, the one reading the XML, isn't an XML parser, because an XML parser wouldn't care whether that white space is there or not. 


      Have your first application (the one writing the xml) insert carriage returns with xsl:text elements that have a single carriage return as their content, like this:




      Bob

  8. stripping out tabs and cr
    2004-12-06 07:28:13 Boris Rousseau
    I am using the same xsl as you are. Nevertheless I still have some trouble getting rid of the tabs and cr. In fact, when I display it into a tree, I get some empty leafs. The only difference I can see is that I am using "*" instead of the root element name.


    Do you have any idea of what could be the reason?

    • stripping out tabs and cr
      2004-12-06 07:59:34 Bob DuCharme
      Which stylesheet do you mean, what do you mean by "empty leafs," and which XSLT processor are you using? 


      Bob

      • stripping out tabs and cr
        2004-12-07 05:03:57 Boris Rousseau
        I am using this code:
        version="1.0">










        Nevertheless, I am still getting some whitespaces (due to carriage returns and/or tabs). And I am using the Java 2 javax.xml.transform class as the XSLT processor.

        • stripping out tabs and cr
          2004-12-07 05:49:53 Bob DuCharme
          First, check that your xmlns:xsl attribute value has a closing quote, because it didn't in the one that you included in your message.


          Second, try it with Saxon and Xalan. I tried it with Saxon, Xalan Java, and libxslt after fixing the quote problem described above, and it worked properly with all of them. If this is the case with you, then the problem is in your use of the javax.xml.transform class and not in the stylesheet.


          Bob

          • stripping out tabs and cr
            2004-12-07 06:40:31 Boris Rousseau
            I have solved the problem in my java code while parsing the document instead of using xslt.


            Thanks for your help.

  9. Whitespace
    2004-11-02 14:33:34 
    survival & Empathy;)
    It's the glue that holds it all together.
    James.
  10. whitespaxce
    2004-08-11 03:53:59 
    When I do an xslt transformation my carriage returns are no longer in the output. The carriage returns are within a text node so by default they should be left alone?
    
    • whitespaxce
      2004-08-11 06:34:20 Bob DuCharme
      It depends on what else is in the stylesheet. I would post the whole thing to the XSL mailing list; see http://www.mulberrytech.com/xsl/xsl-list/ for more.
      
  11. Well formed printing
    2002-03-11 08:57:49 Antonio Murro
    Hi,
    I would you like to print an xml document through an xsl stylesheet with a page format. Have you got some suggests about that? How can I force through the xsl a page break in the xml printout?
    Thanks
    Antonio
    • Well formed printing
      2004-01-10 05:46:36 Shobhit Deep
      Hi,
      I think you can use the following style in the style of the paragraph after which you want to break the page. At the time of printing this will cause the prints to come in seperate pages.
      PAGE-BREAK-AFTER: always;
  12. Controlling Whitespace
    2001-11-12 13:41:41 Bill Benge
    Good article, good examples, if I can understand it, any one can.
    Thanks
  13. preserving whitespace
    2001-11-08 19:31:28 Jeffrey Langdon
    Nice Article.  Very helpful Bob.


    Jeff Langdon

1 to 13 of 13