XSLT as Pretty Printer
Introduction
Recently I was wading through some hard-to-read XML files. Art & Logic, the company I work for, was helping a client to build an Ajax-style Web interface that used XML to talk to the backend and client-side XSLT to produce the HTML. I found myself reformatting the XML by hand to make things easier and finally wondering as I hit the spacebar yet again: couldn't an XSLT style sheet do this formatting for me? I had done something similar before, so I decided to try writing that style sheet, using a test-driven approach. Some hours later I had a handy utility, and a new appreciation for some of the wrinkles of XML. Here's a cleaned-up account of what I did.
So what will the test be? Well, since XSLT is itself a dialect of XML, the stylesheet (call it indent.xsl) will be an XML document. Why not just use the code for its own test? If I make sure my code looks good as I write it, then indent.xsl should transform itself to itself. So I write a shell script like
# Use my local XSLT processor...
~/runXslt indent.xsl indent.xsl out.xml
diff indent.xsl out.xml
First Steps
Inspired by Extreme Programming, I start with The Stupidest Thing That Could Possibly Work: an empty style sheet with a hopeful comment. I specify that the output is generic XML, and include the usual XSLT namespace so that the XSLT processor knows that xsl:... elements are XSLT instructions and not just data. I'll stick with XSLT version 1.0 since that has solid support (such as the Saxon library that I'm using).
<!--
Used for formatting XML into a reasonable style.
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>
</xsl:stylesheet>
The output from the diff test is
1,6c1,2
< <!--
< Used for formatting XML into a reasonable style.
< -->
< <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
< <xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>
< </xsl:stylesheet>
---
>
>
Well darn, that didn't work. It produces no output at all. Looking at the test results, the first problem is that it ignores the comment. Inserting a simple template should take care of that: when it sees a comment, it should just copy it through.
<xsl:template match="comment()">
<xsl:comment>
<xsl:value-of select="."/>
</xsl:comment>
</xsl:template>
OK, but now it doesn't produce the style sheet element. How about another template to copy each input element to an output element with the same name.
<xsl:template match="*">
<xsl:element name="{name(.)}"/>
</xsl:template>
The output looks like this:
<!--
Used for formatting XML into a reasonable style.
--><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/>
...
Getting better. But I also need to start a new line between the comment and the element.
<xsl:template match="*">
<xsl:text>
</xsl:text>
<xsl:element name="{name(.)}"/>
And I want the element's attributes too, so add them to the element by hand.
<xsl:element name="{name(.)}">
<xsl:for-each select="@*">
<xsl:attribute name="{name(.)}"><xsl:value-of select="."/></xsl:attribute>
</xsl:for-each>
</xsl:element>
Actually, as you can see above, I already got one attribute for free, namely the xmlns:xsl attribute for the XSLT namespace. But this is not a normal attribute. It's there because of the XSLT/XPath data model, in which the tree structure of an XML document contains not only the familiar hierarchy of elements, attributes, and text, but also namespace nodes. The namespace nodes attached to an element tell an XML application how to interpret the names inside that element. When you create an output element in your style sheet, XSLT basically copies the namespace nodes from the style sheet into the result, so that's where that free attribute came from.
Annoyingly, the XSLT processor really wants to put the version attribute after xmlns:xsl, but I think they look nicer the other way around. I might be able to fix that, but studying the spec shows that I can't expect to preserve attribute order in general: XSLT makes no guarantees about the relative order of attributes in an element. A style sheet, in general, does not even know the order of the attributes in the input document. XML doesn't care, but diff does. So I'll just accept this as a limitation of my test, and in my code I'll follow the order preferred by my XSLT processor.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
The next problem revealed by the diff is that the style sheet element's children are missing in the output, so I should be applying the element template recursively.
<xsl:for-each select="@*">
<xsl:attribute name="{name(.)}"><xsl:value-of select="."/></xsl:attribute>
</xsl:for-each>
<xsl:apply-templates/>
This weird testing process is beginning to work—the test shows that the matching output is creeping forward slowly, although there's a long way to go.