Utility Stylesheets, Part Two
Last month we looked at some short utility stylesheets, each dedicated to a specific task that may be necessary with a wide variety of XML documents: stripping empty paragraphs, converting mixed content to element content, and adding ID values to elements. Stylesheets like these can serve as building blocks in the creation of a large, complex workflow composed of pipelined modular processes. This week, we'll look at several more such stylesheets.
XML namespaces play an important role in XML
applications; they help to keep track of which elements and attributes
come from where, but to be honest, they're such a pain sometimes. The
following stylesheet copies all source tree nodes to the result tree,
and it uses XPath 1.0's local-name() function to make sure
that the elements and attributes on the result tree have no namespace
prefix. (It must be useful -- when I suggested last month that
readers send in their own short utility stylesheets, one sent me his
own version of this stylesheet without knowing that I had planned to
include one just like it.)
<!-- Copy document, stripping namespaces, i.e. for elements
and attributes only copy the local part of their names. -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<xsl:template match="processing-instruction()|comment()">
<xsl:copy>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
In XSLT, xsl:copy elements and literal result
elements are popular ways to add elements and attributes to result
trees, but this stylesheet demonstrates a key advantage of
using xsl:element and xsl:attribute elements
instead: because they offer more control over the names of those
elements and attributes. The name attributes in these
elements call the local-name() function to convert the
original names to the ones with no namespace prefixes; using other
function calls (or combinations of functions) can let you be even more
creative in how you name your result elements and attributes.
The use of qualified
names (names that include a namespace prefix) in attribute values
is generally considered a Bad Idea in XML design. After all, a
namespace prefix is only standing in for the full URI of the namespace
it represents, and while XML parsers track the prefix/URL relationship
for a document's element and attribute names, they don't do this for
attribute values. See Kendall Clark's February 2002
XML Deviant column for a fuller discussion, which points out that
XSLT 1.0 itself uses qnames in attribute values. For example, if you
declare that xmlns:h="http://www.w3.org/1999/xhtml", you can then set
your xsl:template element's match attribute to
"h:h1" or "h:p" to define a template rule for h or p
elements from the http://www.w3.org/1999/xhtml namespace.
When I read in a W3C IRC log that
"XSLT 1.0 can't deal well with qnames," however, I took it as a
challenge -- it can't deal well with qnames if you don't use the
(little-used) namespace:: axis. With a bit of help from David
Carlisle, I came up with a stylesheet that converts a namespace prefix
in an attribute value to the corresponding URI:
<!-- qname2uri.xsl: convert namespace prefixes in attribute values to
their associated URIs. -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="@*[contains(.,':')]">
<!-- For any attributes that have a colon in their value... -->
<xsl:variable name="nsprefix">
<xsl:value-of select="substring-before(.,':')"/>
</xsl:variable>
<xsl:variable name="nsURI">
<!-- URI that the prefix maps to: namespace node of parent
whose name() = the namespace prefix. -->
<xsl:variable name="nsNode" select=
"parent::*/namespace::*[name() = $nsprefix]"/>
<xsl:choose>
<xsl:when test="$nsNode">
<xsl:value-of select="$nsNode"/>
</xsl:when>
<xsl:otherwise>
<!-- Uncomment the following xsl:text element
to flag prefixes that weren't declared. -->
<!-- <xsl:text>NO-URI-DECLARED-FOR-PREFIX:</xsl:text>-->
<xsl:value-of select="$nsprefix"/>
<xsl:text>:</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<!-- Add attribute to result tree, substituting URI for prefix. -->
<xsl:attribute name="{name()}">
<xsl:value-of select="$nsURI"/>
<xsl:value-of select="substring-after(.,':')"/>
</xsl:attribute>
</xsl:template>
<!-- Copy anything not covered by that first template rule. -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The stylesheet has two template rules: the first
handles attributes with colons in their values, and the second copies
any other source tree node to the result tree unchanged. The first
defines two variables to make its logic more modular: the "nsprefix"
variable stores the namespace prefix, and the "nsURI" variable stores
the URI that corresponds to that namespace prefix. If the stylesheet
declares no URI for that prefix, "nsURI" just stores the prefix;
uncommenting the xsl:text element with the value of
"NO-URI-DECLARED-FOR-PREFIX:" adds that string to flag the lack of a
properly declared URI for that prefix. You can easily change that to a
proper URI or to any string you want.
To test this stylesheet, I used the following document as a source document:
<a xmlns:sn="http://www.snee.com/ns/whatever#"> <b>this is a test</b> <b attr1="sn:blah">Second b element.</b> <b attr1="xx:blah">Third b element.</b> <!-- No declaration for xx. --> <c xmlns:sn="http://www.example.com/"> <!-- Redeclared prefix. --> <d color="red" direction="north"> <!-- No colons in these values. --> <x attr2="sn:foo">nested namespace</x> </d> </c> </a>
The three commented lines attempt to trip up a conversion program that doesn't handle the URI-prefix mapping properly. Although it's not a very extensive test, it shows that the stylesheet works pretty well, creating this result from it:
<?xml version="1.0" encoding="utf-8"?><a xmlns:sn="http://www.snee.com/ns/whatever#">
<b>this is a test</b>
<b attr1="http://www.snee.com/ns/whatever#blah">Second b element.</b>
<b attr1="xx:blah">Third b element.</b> <!-- No declaration for xx. -->
<c xmlns:sn="http://www.example.com/"> <!-- Redeclared prefix. -->
<d color="red" direction="north"> <!-- No colons in these values. -->
<x attr2="http://www.example.com/foo">nested namespace</x>
</d>
</c>
</a>
The second b element's prefix was mapped to
the snee.com URI, and the third b element's prefix was left
alone because it had no corresponding URI. The d element's
attribute values were left alone, and the x element's
namespace prefix, which was the same as the one on the
second b element, was mapped to a different URI: the one that
the "sn" prefix was mapped to in the c element that contains
the d element, thereby showing that the scoping of the
declarations was respected.
There are several utilities available that can convert a file's encoding, but if you need to convert the encoding of an XML document, an XSLT processor and an eight-line stylesheet (OK, a little longer with blank lines added for readability) are all you need.
The following stylesheet has only one template rule: the same one we've seen in most of the utility stylesheets, which copies everything passed to it verbatim.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output encoding="utf-16"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
|
Also in Transforming XML | |
The stylesheet also has an xsl:output
element. This element has many useful attributes, and
the encoding one is particularly valuable: tell it what
encoding to use when writing the result document, and your stylesheet
is ready to convert some documents. If your XSLT processor can't
handle the output encoding you've asked for, it will tell you.
The choice of encodings that your XSLT processor can read and write isn't entirely up to the processor. The XML parser that it uses determines which encodings it can read, and for a Java-based XSLT processor, the JVM in use may limit the number of supported output encodings. Check your processor's documentation -- for example, the "Character encodings supported" section of Saxon 6.5.3's Standards Conformance page lists four input encodings recognized by the built-in AElfred parser that it uses by default, and nine encodings that it supports for output, if your JVM supports them.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.