XML's White Space Policy

XML has an incredibly simple rule about how to handle white space, that is contained in this one sentence: "If it ain't markup, it's data." Under no circumstances will an XML processor discard some white space because, in the processor's opinion, it is not "significant".

Let's look at our white space example again:

<p>Little boys, ingredients for:
  <ol>
    <li>Snips,</li>
    <li>snails,</li>
    <li>puppy dogs' tails.</li>
  </ol>
</p>

An XML processor will pass the application not just the title and the ingredients, but all the white space characters you can see before the <ol> and <li> tags, and also the line-end characters you can't see; in this case, 7 of them. (But note that an XML processor will clean up the line-ends as described in the next section, so while apps are going to have to wrestle with white space, they won't have to deal with CR-NL on windows and CR on Mac and NL on Unix.)

This behavior is going to cause some surprises and problems for XML users and programmers, because we've come to expect (as a result of working with SGML and HTML) "insignificant" white space to auto-magically vanish.

On the other hand, those who've actually worked with real SGML tools will generally approve of XML's behavior, because it has an important virtue, namely that the rule is simple and anyone can understand it: all white space gets passed through, always.

Back-link to spec

Copyright © 1998, Tim Bray. All rights reserved.