XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Controlling the DOCTYPE and XML Declaration

Controlling the DOCTYPE and XML Declaration

September 04, 2002

XSLT processors usually create result documents that are well-formed XML with a simple XML declaration at the top. They don't have to add that XML declaration, though; it's easy to suppress it. It's also easy to add one and control exactly what it shows, such as an encoding declaration or a declaration of the version of XML being used. Your result document can also include a document type declaration that specifies the DTD to which it conforms, which is necessary for your result document to be a valid XML document. This month we'll see how to add these.

XML Declarations

The XML declaration at the beginning of an XML document is not necessary, but it's the best way to say "this is definitely an XML document and here's the release of XML it conforms to." The following is typical:

<?xml version="1.0"?>
Note Despite its beginning and ending question mark, an XML declaration is not a processing instruction; it's a separate kind of markup declaration. In fact, the XML specification explicitly prohibits the processing instruction target (the name right after a processing instruction's opening question mark) from being "xml" in any case in order to prevent a processing instruction from being confused with an XML declaration.

An XSLT processor's default behavior is to add an XML declaration to the beginning of an XML document that it creates in the result tree. If your stylesheet includes an xsl:output instruction with a method value of "text" or "html" the XSLT processor doesn't consider the result tree's document to be XML, so it won't add an XML declaration. If method is "xml" or the stylesheet has no xsl:output element (in which case the default value of "xml" is assumed), the result is considered an XML document. To show the simplest case, we'll apply the simplest possible stylesheet

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0"/>

to this little document:

<test>Dagon his Name, Sea Monster</test>

The result, thanks to XSLT's built-in template rules, shows the element's character data with the XML declaration preceding it:

<?xml version="1.0" encoding="utf-8" ?>Dagon his Name, Sea Monster

Although an XML declaration is optional, when it is included it must have the version information. (As I write this, XML 1.1 is in Last Call status, so we'll have to start worrying about whether XML processors are aware of 1.1's new features soon.) In the example above, after the version information, the XML declaration includes an encoding declaration to tell us how the characters in the document are encoded. While the XML specification considers an encoding declaration to be optional if the document is encoded as UTF-8 or UTF-16, the XSLT specification says that XSLT processors must add one to the result document with a value of "utf-8" or "utf-16" if no other encoding value is specified.

You can specify one yourself or change the version value by adding encoding and version attributes to an xsl:output element in your stylesheet. The encoding attribute actually does more than add an encoding declaration to the result document; it tells the XSLT processor to write out the result using that encoding. If you specify an encoding that it can't handle, the processor will let you know.

The following stylesheet adds an encoding declaration and version information to the result document.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">
  <xsl:output method="xml" version="1.1" encoding="utf-16"/>
</xsl:stylesheet>

This produces the following using the same input as the previous example (although it may not look right in text editors that can't handle UTF-16):

<?xml version="1.1" encoding="utf-16" ?>Dagon his Name, Sea Monster

That's just a toy example. The following slightly longer program is actually useful. It copies an XML document without changing anything, except that it writes out the result as a UTF-16 document:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

  <xsl:output encoding="utf-16"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

By changing the value of its encoding attribute, you can create a general-purpose stylesheet to copy an XML document with the copy being in any encoding that you want, as long as your XSLT processor supports that encoding.

What if you don't want an XML declaration in the result of your transformation? For example, I rarely show them in the result of my examples because I want the examples to be as concise as possible. I suppress them by adding an omit-xml-declaration attribute to most of the sample stylesheets' xsl:output elements, like this:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">
  <xsl:output method="xml" omit-xml-declaration="yes"/>
</xsl:stylesheet>

The output of this stylesheet applied to the earlier XML document is identical to the output created with the earlier stylesheet, minus the XML declaration:

Dagon his Name, Sea Monster

Pages: 1, 2

Next Pagearrow