Q: Can I un-CDATA my CDATA section?
I have some HTML tags embedded in a CDATA section. (I didn't write
the source document!) When my XSLT translates the document into HTML
for a browser, the tags in the CDATA section marked as
<i> are delivered to the browser as
Is there anything I can do to prevent this translation?
A: You didn't provide a sample document fragment, but I assume from your description that you have to deal with something like the following in your source document:
<true_xmlwrapper> <![CDATA[ <html> <head><title>Weird Embedded Markup</title></head> <body> <h1>Someone thought he was being clever...!</h1> <p><em>[etc.]</em></p> </body> </html> ]]> </true_xmlwrapper>
I assume further that what you'd hope to transform the above into -- the result tree -- would be something like:
<html> <head><title>Weird Embedded Markup</title></head> <body> <h1>Someone thought he was being clever...!</h1> <p><em>[etc.]</em></p> </body> </html>
And, if these assumptions are correct, you've probably got an XSLT stylesheet with a template rule such as:
<xsl:template match="true_xmlwrapper"> <xsl:value-of select="."/> </xsl:template>
As you've probably discovered, this solves one problem -- it
subtracts the opening and closing
]]> delimiters. What it writes out to the result tree,
though, isn't the desired nice and neat HTML code but rather the quite
<html> <head><title>Weird Embedded Markup</title></head> <body> <h1>Someone thought he was being clever...!</h1> <p><em>[etc.]</em></em> </body> </html>
|Do you have an answer for these XML questions? Share your experience in our forum.|
There is a strange kind of correspondence between the desired and actual results. What the actual result tree is saying might be translated as "The angle brackets in the following lines are not to be treated as markup delimiters, but as literal characters." And guess what? That's exactly how the CDATA section in this (or any other) source document suggests markup-significant characters should be treated. Whoever created that document evidently imagined him or herself to be doing the downstream application a favor -- as though by shrouding the embedded HTML markup in a CDATA section it was protected from tampering by alien forces (like one of those blasted XSLT processors). In fact, what wrapping in CDATA did was to announce to any markup-aware application, "This looks like markup but really isn't -- it's not even HTML." Under the circumstances, the assumptions made by the XSLT processor are quite reasonable.
All that said, here's something for you to try. (It's worked for me with both the MSXML and Saxon XSLT processors.) In your XSLT stylesheet, include this top-level element:
This approach may seem counterintuitive, even weird. After all, if the problem resides in the input side of the transformation, what good would specifying the output's characteristics do?
But in the absence of any
xsl:output element at all, the
XSLT processor attempts to figure out the stylesheet's intentions by
examining the result tree from the transformation. This figuring-out
uses a series of tests whose purpose is to determine whether the
result tree is HTML (and by default, the version is HTML 4.0,
not XHTML); if not, the result tree is assumed to be a
well-formed XML general parsed entity. (Such an entity may or may not
be a well-formed document. For instance, the root node may
contain two child elements.) The four tests of an HTML result tree
(and all must be true) are
- the result tree's root node has an element child (that is, it has a root element);
- the local name of the root element (discounting any namespace prefix) is "html";
- the root
htmlelement has no namespace URI associated with it; and
- the only text nodes preceding the result tree's root element are whitespace-only text nodes.
In the case of a document like the one you describe, these tests
are almost immaterial: no matter how much it looks like it contains
markup, a CDATA section by definition contains only literal text. So
by default, there is no "root element" in the above result tree, an
html or anything else. There's just a string of literal
characters which happens to start with a literal
character. Since the result tree fails the HTML test, the processor
guesses the result tree is simply a well-formed general parsed entity
-- consisting, in this case, of a single text node.
But by specifying
method="text", you short-circuit the
processor's default behaviors, instructing it not to make any
assumptions at all about the nature of the result.
(There are two dangers in using this little trick, by the way. First, it's global: you can't apply it selectively to some sections of the source/result trees but not to others. Second, and more importantly, if the "markup" within the CDATA section isn't well-formed, it will simply be passed without complaint to the result tree. If the downstream application meant to consume this result tree is XML- or HTML-aware, you may be faced with disastrous downstream complications.)
Q: I keep losing a trailing space inside my empty-element tags.
To keep my XHTML compatible with older browsers (like Netscape 4.77), my XSLT transformation includes a space before the trailing slash on empty XHTML elements, like this:
<xsl:template match="model/name"> <em>Model Name: </em> <xsl:apply-templates/><br /> <!-- Note space ^ --> </xsl:template>
However, the transformation ends up looking something like
<em>Model Name: </em> Nimbus
<!-- No space ^ -->
Also in XML Q&A
That's fine for newer browsers, but older browsers don't recognize
<br/> as a
<br> tag, and hence
ignore it, which is just no good. I've looked at a number of
techniques for controlling whitespace in XML (Bob DuCharme's series,
for instance), but all of these techniques focus on the content of
elements, not the element tags themselves. I recognize that XML has
its reasons for handling whitespace the way it does, and that from an
XML perspective trying to control whitespace within a tag is a
little batty. But does anyone know of a workaround, short of fixing it
with, say, a Perl script after the transformation?
A: A Perl script? After the transformation?
<shudder/> I mean, I love Perl, but still....
There are a couple of approaches to resolve this issue.
First, remember that an empty element can be represented by a contiguous start tag/end tag pair, like:
So you may be able to put this into the result tree instead of the
<br/> (with or without the space
before the slash). One problem with this solution is that some
versions of older browsers may interpret this as two
br elements in sequence.
A better solution is a variation of the answer to the first question in this month's column. As I described above, the XSLT processor makes an educated guess about the result tree. I don't know why this educated guess is failing to recognize your result tree as HTML 4.0 (which is readable by both older and newer browsers). But you can force the interpretation with this top-level element:
In this case, for instance, when your stylesheet includes an
<br/> tag (again, with or without the
space), a compliant processor will output it in the HTML-compliant
I realize this may introduce an unwanted wrinkle to your problem; it forces the result tree to be not XHTML, just plain old dumb HTML 4.0. Unfortunately we're at a transitional stage in both browser and XHTML development. If I were you, I'd leverage the still-forgiving nature of the newer browsers rather than coding to XHTML strict standards and hoping that older browsers will somehow function as expected. (They often didn't comply with standards in place at the time the browsers were built; it's no wonder they adhere to newer standards even less rigorously.)
- I tried this and it will not work
2005-12-15 16:28:09 laura_b
- literal < and > symbols
2002-11-15 01:51:27 andre hoogeland
- disable-output-escaping not req'd
2002-04-30 14:33:47 Greg Faron
- Re: Don't some XSLT engines output "friendly" XHTML
2002-04-30 08:02:07 Richard Rathmann
- CDATA sections
2002-04-29 05:42:08 sCRIBle -
- Don't some XSLT engines output "friendly" XHTML
2002-04-27 16:23:20 Jess Holle
- Why not use "disable-output-escaping" instead?
2002-04-25 08:09:36 Dan Cederholm
- Aswer to Q: Can I un-CDATA my CDATA section?
2002-04-25 00:17:19 Aleksander Dye