XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XSLT as Pretty Printer
by Hew Wolff | Pages: 1, 2, 3

Indentation

The test now points out that the first child element needs indenting. Probably that means that each element has an associated depth, and the element template gets this depth as a parameter. I'll indent with three spaces.

   <xsl:template match="*">
      <xsl:param name="depth">0</xsl:param>
      <!-- New line with indenting. -->
      <xsl:if test="$depth > 0">
         <xsl:text>    </xsl:text>
      </xsl:if>
      <xsl:text>&#xA;</xsl:text>
      <xsl:element name="{name(.)}">
         <xsl:for-each select="@*">
            <xsl:attribute name="{name(.)}"><xsl:value-of select="."/></xsl:attribute>
         </xsl:for-each>

         <xsl:apply-templates>
            <xsl:with-param name="depth" select="$depth + 1"/>
         </xsl:apply-templates>
      </xsl:element>
   </xsl:template>

Hmm, indenting isn't happening. Add some debugging code.

      <!-- New line with indenting. -->
<xsl:value-of select="concat('depth: ', $depth)"/>
      <xsl:if test="$depth > 0">

Oh, right, I have to indent after the new line.

      <!-- New line with indenting. -->
      <xsl:text>&#xA;</xsl:text>
      <xsl:if test="$depth > 0">
         <xsl:text>    </xsl:text>
      </xsl:if>

OK, the first child tag is indented now, but there's a blank line separating it from its parents, which looks bad. I'll start by gaining as much control of the whitespace as possible. That means no automatic indentation by the XSLT processor, and whitespace preserved in my text elements but nowhere else.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
   <xsl:output method="xml" indent="no" encoding="UTF-8" omit-xml-declaration="yes"/>
   <xsl:strip-space elements="*"/>
   <xsl:preserve-space elements="xsl:text"/>

Now I need to figure out exactly when I want a blank line above an element. This will turn out to be the trickiest part of the whole operation: trying to capture my personal intuition about what spacing looks good. A reasonable rule for now is to insert a blank line before an element that has children, whenever there's something else above it.

      <xsl:text>&#xA;</xsl:text>
      <xsl:if test="position() > 1 and count(./*) > 0">
         <xsl:value-of select="'&#xA;'"/>
      </xsl:if>

The test now says:

13,16c12,13
<       <xsl:comment>
<          <xsl:value-of select="."/>
<       </xsl:comment>
<    </xsl:template>
---
>    <xsl:comment>
>    <xsl:value-of select="."/></xsl:comment></xsl:template>

That reminds me that deeply nested elements have to be indented more. That suggests an indentation template, which would also take a depth parameter. Since XSLT doesn't really have a concept of iteration, I use recursion instead.

      <!-- Set off a large element with a blank line. -->
      <xsl:if test="position() > 1 and count(./*) > 0">
         <xsl:text>&#xA;</xsl:text>
      </xsl:if>
      <xsl:call-template name="indent">
         <xsl:with-param name="depth" select="$depth"/>
      </xsl:call-template>
      ...

   <xsl:template name="indent">
      <xsl:param name="depth"/>

      <xsl:if test="$depth > 0">
         <xsl:text>   </xsl:text>
         <xsl:call-template name="indent">
            <xsl:with-param name="depth" select="$depth - 1"/>
         </xsl:call-template>
      </xsl:if>
   </xsl:template>

Closing tags require a newline and indentation too. But only when there are child elements: a simple one-line element, maybe with some text in it, looks OK.

      <xsl:element name="{name(.)}">
         <xsl:for-each select="@*">
            <xsl:attribute name="{name(.)}"><xsl:value-of select="."/></xsl:attribute>
         </xsl:for-each>

         <xsl:apply-templates>
            <xsl:with-param name="depth" select="$depth + 1"/>
         </xsl:apply-templates>

         <xsl:if test="count(./*) > 0">
            <xsl:text>&#xA;</xsl:text>
            <xsl:call-template name="indent">
               <xsl:with-param name="depth" select="$depth"/>
            </xsl:call-template>
         </xsl:if>
      </xsl:element>

Much better.

Nailing Down the Test

I'll summarize the remaining steps more briefly. For reference, you can skip ahead to the complete code at the end.

There's a gratifying amount of refactoring to be done in the handling of elements and comments. First of all, comments need to be indented pretty much the same as elements. Since they share a lot of code, I made them separate cases in the same template, using xsl:choose. Then a colleague pointed out that, rather than explicitly instantiating the output node using xsl:element or xsl:comment, it's simpler to use xsl:copy. This feature creates a copy of the current input node, and also (unlike xsl:copy-of) lets me add whitespace children in the output. Also, it's not necessary to iterate explicitly through an element's attributes when the expression @* gives me all of them at once. This leads to the nice code below.


   <xsl:template match="*|comment()">
      <xsl:param name="depth">0</xsl:param>
      ...
      <xsl:copy>
         <xsl:if test="self::*">
            <xsl:copy-of select="@*"/>

            <xsl:apply-templates>
               <xsl:with-param name="depth" select="$depth + 1"/>
            </xsl:apply-templates>
            ...
         </xsl:if>
      </xsl:copy>
      ...
   </xsl:template>

By the way, the first time I tried this, I couldn't get it working because I left out the self:: axis prefix. There's a parallel between the template match pattern and the test expression, but the parallel is deceptive. In the second case a context node has already been established, and the default axis is child::. So * means "all children of the current node that are elements," but I want self::*, which means "the current node if it's an element."

I kept getting tripped up by further ambiguities like the attribute order mentioned above. For example, the processor keeps taking my &#xA; (the character reference for a new line character) and converting it into a literal new line. This is correct XML, but it messes up the formatting. It turns out that this is another case where the output text is not guaranteed: the processor may escape characters if it wants to. XSLT does provide a mechanism to control output escaping in some cases, so I added a template to restore the new line character references in text nodes.


   <xsl:template match="text()">
      <xsl:call-template name="escapeNewlines">
         <xsl:with-param name="text">
            <xsl:value-of select="."/>
         </xsl:with-param>
      </xsl:call-template>
   </xsl:template>
   ...
   <xsl:template name="escapeNewlines">
      ...
   </xsl:template>

Similarly, I would like to use literal < and > characters in my XPath expressions, but the processor prefers to escape them, and it has the right to. Here, for the sake of the test, I just followed the processor's preference:


      <xsl:if test="$depth &gt; 0">

The XSLT processor would also be within its rights to insert extra whitespace between attributes of an element. Fortunately, the default behavior, inserting just one space, is also what I want to enforce.

Another approach to these escaping problems would be to tell the processor that I'm writing plain text instead of XML. This would allow finer control, at the cost of more complex code: I would write the output character by character, rather than describing it as a tree of XML nodes. This works, but I decided it wasn't worth the complexity.

The test has one last nitpick: the last line in the file should be terminated with a new line.


      <xsl:variable name="isLastNode" select="count(../..) = 0 and position() = last()"/>

      <xsl:if test="$isLastNode">
         <xsl:text>&#xA;</xsl:text>
      </xsl:if>

Pages: 1, 2, 3

Next Pagearrow