Controlling the DOCTYPE and XML Declaration
by Bob DuCharme
|
Pages: 1, 2
Valid XML Output: Including DOCTYPE Declarations
A valid XML document is one that has a document type (or "DOCTYPE") declaration and conforms to the DTD in that document type declaration. (Remember, an XML document with no DOCTYPE declaration isn't valid, but it can still be a legal XML document as long as it's well-formed. "Valid" is a technical term referring to the presence of and conformance to a DOCTYPE declaration.)
A DOCTYPE declaration can include DTD declarations as an internal DTD subset between square brackets, like this,
<!DOCTYPE chapter [
<!ELEMENT chapter (title,para+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT para (#PCDATA)>
]>
or it can point to DTD declaration stored in a separate file like this:
<!DOCTYPE chapter SYSTEM "../dtds/chapter.dtd">
The SYSTEM identifier tells the XML parser where to find the DTD file on the system. An optional PUBLIC identifier can specify another string for the parser to use when locating a DTD file. These usually use a string similar to the following, which avoids any system-specific information to make the document more portable across different systems:
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML//EN"
"../dtds/chapter.dtd">
|
Related Reading
XSLT |
The XML parser should look up this PUBLIC identifier somewhere to find the exact location of the local copy of the DTD file. (There have been proposals for the format and location of the lookup table, but none has caught on enough to be a widespread standard in the XML world, so that "somewhere" has never been completely resolved. In fact, people are using PUBLIC identifiers less and less anyway.) If it can't find it, the parser uses the SYSTEM identifier following the PUBLIC identifier. In the example above, the SYSTEM identifier doesn't need the word "SYSTEM" -- because it's a required parameter, the XML parser knows what it is.
To create valid XML documents using XSLT, a stylesheet must add a DOCTYPE declaration to the result tree. Because a DOCTYPE declaration isn't an element or a processing instruction, standard methods for adding those to your result tree won't accomplish this. Instead, an XSLT processor knows that it must create a DOCTYPE declaration in your result document when it sees certain specialized attributes in an xsl:output element.
Two more xsl:output attributes let you add SYSTEM and PUBLIC declarations to a DOCTYPE declaration in your result. If your xsl:output element has a doctype-system attribute, the XSLT processor adds a DOCTYPE declaration to the result tree with that attribute's value as its SYSTEM identifier. If it also has a doctype-public attribute, it adds this attribute's value to the result's DOCTYPE declaration as a PUBLIC identifier. (An XSLT processor ignores a doctype-public attribute without an accompanying doctype-system attribute, because an XML document can't have a PUBLIC identifier without a SYSTEM identifier.)
The following example source document conforms to the DocBook DTD.
<chapter><title>Chapter 1</title>
<para>More unexpert, I boast not: them let those</para>
<para>Contrive who need, or when they need, not now.</para>
<para>For while they sit contriving, shall the rest,</para>
<para>Millions that stand in Arms, and longing wait</para>
</chapter>
The following stylesheet just copies it to the result tree. Because its xsl:output instruction includes both doctype-system and doctype-public attribute specifications, the result will include a DOCTYPE declaration with both of these identifiers.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" doctype-system="../dtds/docbookx.dtd"
doctype-public="-//OASIS//DTD DocBook XML//EN"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The stylesheet could have had different instructions after that xsl:output element to rearrange, rename, or delete the elements, or to perform any of the other XSLT tricks possible on the source tree's nodes as they're copied to the result tree. The DOCTYPE declaration added to the result tree would still look like the one produced by the stylesheet and input document above, as shown here:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE chapter
PUBLIC "-//OASIS//DTD DocBook XML//EN" "../dtds/docbookx.dtd">
<chapter><title>Chapter 1</title>
<para>More unexpert, I boast not: them let those</para>
<para>Contrive who need, or when they need, not now.</para>
<para>For while they sit contriving, shall the rest,</para>
<para>Millions that stand in Arms, and longing wait</para>
</chapter>
How does the XSLT processor know what to put for the document type (the "chapter" part in "DOCTYPE chapter")? It knows the root element of the document it's creating in the result tree, and that's what an XML document type is: the element that serves as the document's root element.
If the method attribute of the stylesheet's xsl:output element has a value of "text", then a DOCTYPE declaration for the result tree wouldn't make any sense, because a non-XML text file won't have any use for a DOCTYPE declaration. If method has a value of "html", a DOCTYPE declaration might make sense; some Web pages, especially XHTML documents, actually do conform to a DTD, so specifying doctype-system and doctype-public attribute values for such an xsl:output element method attribute can be useful.
|
Also in Transforming XML | |
The DOCTYPE declarations added this way can only point to external DTD files. XSLT offers no way to create a result tree DOCTYPE declaration with an internal DTD subset (that is, with DTD declarations between the square brackets, as shown in the first example earlier). The DTD named in your doctype-system attribute must have all the declarations that your document needs.
This column has mentioned five different attributes of the xsl:output element, and that's only half of them. The others are definitely worth exploring as you learn more ways to fine-tune your result documents.
- how to extract the DOCTYPE via xslt in xml to xml transform ?
2005-05-13 12:16:59 DJAY - how to extract the DOCTYPE via xslt in xml to xml transform ?
2005-05-13 14:21:05 Bob DuCharme - encoding = Cp1252 to encoding = UTF-8
2003-04-11 04:54:02 raj sekhar - encoding = Cp1252 to encoding = UTF-8
2003-04-11 07:14:51 Bob DuCharme - Misleading statements...
2002-09-06 21:24:04 Dare Obasanjo
