Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Controlling the DOCTYPE and XML Declaration
by Bob DuCharme | Pages: 1, 2

Valid XML Output: Including DOCTYPE Declarations

A valid XML document is one that has a document type (or "DOCTYPE") declaration and conforms to the DTD in that document type declaration. (Remember, an XML document with no DOCTYPE declaration isn't valid, but it can still be a legal XML document as long as it's well-formed. "Valid" is a technical term referring to the presence of and conformance to a DOCTYPE declaration.)

A DOCTYPE declaration can include DTD declarations as an internal DTD subset between square brackets, like this,

<!DOCTYPE chapter [
<!ELEMENT chapter (title,para+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT para (#PCDATA)>
]>

or it can point to DTD declaration stored in a separate file like this:

<!DOCTYPE chapter SYSTEM "../dtds/chapter.dtd">

The SYSTEM identifier tells the XML parser where to find the DTD file on the system. An optional PUBLIC identifier can specify another string for the parser to use when locating a DTD file. These usually use a string similar to the following, which avoids any system-specific information to make the document more portable across different systems:

<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML//EN"
          "../dtds/chapter.dtd">

Related Reading

XSLT

XSLT
By Doug Tidwell

The XML parser should look up this PUBLIC identifier somewhere to find the exact location of the local copy of the DTD file. (There have been proposals for the format and location of the lookup table, but none has caught on enough to be a widespread standard in the XML world, so that "somewhere" has never been completely resolved. In fact, people are using PUBLIC identifiers less and less anyway.) If it can't find it, the parser uses the SYSTEM identifier following the PUBLIC identifier. In the example above, the SYSTEM identifier doesn't need the word "SYSTEM" -- because it's a required parameter, the XML parser knows what it is.

To create valid XML documents using XSLT, a stylesheet must add a DOCTYPE declaration to the result tree. Because a DOCTYPE declaration isn't an element or a processing instruction, standard methods for adding those to your result tree won't accomplish this. Instead, an XSLT processor knows that it must create a DOCTYPE declaration in your result document when it sees certain specialized attributes in an xsl:output element.

Two more xsl:output attributes let you add SYSTEM and PUBLIC declarations to a DOCTYPE declaration in your result. If your xsl:output element has a doctype-system attribute, the XSLT processor adds a DOCTYPE declaration to the result tree with that attribute's value as its SYSTEM identifier. If it also has a doctype-public attribute, it adds this attribute's value to the result's DOCTYPE declaration as a PUBLIC identifier. (An XSLT processor ignores a doctype-public attribute without an accompanying doctype-system attribute, because an XML document can't have a PUBLIC identifier without a SYSTEM identifier.)

The following example source document conforms to the DocBook DTD.

<chapter><title>Chapter 1</title>
  <para>More unexpert, I boast not: them let those</para>
  <para>Contrive who need, or when they need, not now.</para>
  <para>For while they sit contriving, shall the rest,</para>
  <para>Millions that stand in Arms, and longing wait</para>
</chapter>

The following stylesheet just copies it to the result tree. Because its xsl:output instruction includes both doctype-system and doctype-public attribute specifications, the result will include a DOCTYPE declaration with both of these identifiers.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

<xsl:output method="xml" doctype-system="../dtds/docbookx.dtd" 
     doctype-public="-//OASIS//DTD DocBook XML//EN"/> 

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

The stylesheet could have had different instructions after that xsl:output element to rearrange, rename, or delete the elements, or to perform any of the other XSLT tricks possible on the source tree's nodes as they're copied to the result tree. The DOCTYPE declaration added to the result tree would still look like the one produced by the stylesheet and input document above, as shown here:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE chapter
  PUBLIC "-//OASIS//DTD DocBook XML//EN" "../dtds/docbookx.dtd">
<chapter><title>Chapter 1</title>
  <para>More unexpert, I boast not: them let those</para>
  <para>Contrive who need, or when they need, not now.</para>
  <para>For while they sit contriving, shall the rest,</para>
  <para>Millions that stand in Arms, and longing wait</para>
</chapter>

How does the XSLT processor know what to put for the document type (the "chapter" part in "DOCTYPE chapter")? It knows the root element of the document it's creating in the result tree, and that's what an XML document type is: the element that serves as the document's root element.

If the method attribute of the stylesheet's xsl:output element has a value of "text", then a DOCTYPE declaration for the result tree wouldn't make any sense, because a non-XML text file won't have any use for a DOCTYPE declaration. If method has a value of "html", a DOCTYPE declaration might make sense; some Web pages, especially XHTML documents, actually do conform to a DTD, so specifying doctype-system and doctype-public attribute values for such an xsl:output element method attribute can be useful.

    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

The DOCTYPE declarations added this way can only point to external DTD files. XSLT offers no way to create a result tree DOCTYPE declaration with an internal DTD subset (that is, with DTD declarations between the square brackets, as shown in the first example earlier). The DTD named in your doctype-system attribute must have all the declarations that your document needs.

This column has mentioned five different attributes of the xsl:output element, and that's only half of them. The others are definitely worth exploring as you learn more ways to fine-tune your result documents.


Comment on this articleAre you making use of xsl:output attributes? Share your experience in our forums.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • how to extract the DOCTYPE via xslt in xml to xml transform ?
    2005-05-13 12:16:59 DJAY [Reply]

    i am transforming a an xml file having a DOCTYPE declaration to another xml file
    Can anybody suggest me way of Extracting the DOCTYPE decl from the source XML file to be passed to Destination XML file
    I am aware of But dont knwo how to extract it for the source xml
    thanks in advance

    • how to extract the DOCTYPE via xslt in xml to xml transform ?
      2005-05-13 14:21:05 Bob DuCharme [Reply]

      An XSLT processor can't do this, because it doesn't know what source document's DOCTYPE declaration was. Remember, an XML parser parses the source document, validates it against a DTD if necessary, resolves entity references, etc. before handing the result to the XSLT processor, and the XSLT processor has no way of knowing what was handed to the XML parser. To copy the DOCTYPE declaration from one file to another, you'd have to use some tool that could look at the source document as straight text and not as an XML document to be parsed, such as perl or python.


      Bob


  • encoding = Cp1252 to encoding = UTF-8
    2003-04-11 04:54:02 raj sekhar [Reply]

    when i create a xml file , its creating xml file with encoding="Cp1252" and this file when trying to open in ie6 giving an error


    The XML page cannot be displayed
    System does not support the specified encoding. Error processing resource 'file:///C:/test.xml'. Line 1, Position 40


    <?xml version="1.0" encoding="Cp1252"?>


    and when i change the encoding to
    encoding="UTF-8" (typing in .xml file),
    i can see the perfect output in ie6.


    1.what should be done at the time of creating the xml file so that it should not take encoding="Cp1252" ??


    2.what should be done to support ie6 to Cp1252 format ??





    • encoding = Cp1252 to encoding = UTF-8
      2003-04-11 07:14:51 Bob DuCharme [Reply]

      Use the xsl:output element's encoding attribute to set the output encoding.


      What kind of program uses cp1252 as a default output encoding? In other words, what program are you using to create the original XML file?


      Bob


  • Misleading statements...
    2002-09-06 21:24:04 Dare Obasanjo [Reply]

    The article starts with


    "The XML declaration at the beginning of an XML document is not necessary, but it's the best way to say "this is definitely an XML document and here's the release of XML it conforms to."


    then not even two paragraphs later shows us


    <?xml version="1.0" encoding="utf-8" ?>Dagon his Name, Sea Monster


    which is definitely NOT an XML document regardless of how many XML declarations it begins with.


    A better statement would have been to point out that XSLT allows one to precede output with an XML declaration REGARDLESS of whether or not it is well formed or valid XML.