White Space that Isn't Really There

The kind of white-space the spec is talking about is that used to "pretty-print" XML documents; you've probably seen this in displays of HTML source code. For example:

<p>Little boys, ingredients for:
  <ol>
    <li>Snips,</li>
    <li>snails,</li>
    <li>puppy dogs' tails.</li>
  </ol>
</p>

Pretty clearly, the leading white space used to indent the middle lines is really not "part of" the document.

In SGML, there are a whole lot of rules that are designed to allow you to do this kind of pretty-printing without having the white space leak into the document. The rule the SGML-ers use is "White space caused by markup is not significant." This makes sense on the face of it, but when you try to write down precise rules to describe what "caused by markup" means, it turns out that you get into a horrible rat's-nest of complexity (and this is really being unkind to rats).

We in the XML WG decided that this was a good idea, and that the SGML guys just hadn't been smart enough to write the rules down simply and clearly. We discovered that we were wrong, and that it's an impossibly hairy problem; one that would be nice to solve, but nobody has cracked the nut yet.

Back-link to spec

Copyright © 1998, Tim Bray. All rights reserved.