Menu

Hidden Whitespace, Hidden Meaning

January 30, 2002

John E. Simpson

Q: Too many newlines?

Every time I update an XML file via an in-house content management system, the processed XML file contains masses of empty lines between the tags. Each time I edit and save the file the gaps get bigger.

Here is the code that I think might contain the problem (XSL):

<xsl:output method="xml" indent="no" encoding="iso-8859-1" />

<xsl:template match="/blocks">

<dl>
<dt>

<font size="2"><xsl:value-of select="welcometitle"/></font>

</dt>
<dt>

<font size="2"><xsl:value-of select="blocktitle"/></font>

</dt>

<dt>
<font size="2">

<xsl:value-of select="acttitle"/></font>

</dt>
<dt>

<font size="2"><xsl:value-of select="resourcetitle"/></font>

</dt>
</dl>

</xsl:template>

A: On first reading, I thought the answer was pretty straightforward -- that the extra newlines in your output are there because your XSLT style sheet says to include them. According to this theory, placing newlines in your style sheet's xsl:template elements causes them to be passed, unchanged, to the result tree.

Unfortunately for the theory, that's not the case. Section 3.4 of the XSLT 1.0 Recommendation addresses how an XSLT processor is to treat whitespace-only text nodes in the style sheet. As this section says, a text node in the style sheet is preserved in the result tree only if at least one of the following conditions is true:

  • the text node's parent element is named in an xsl:preserve-space element; or
  • the text node contains at least one non-whitespace character (tab, newline, space); or
  • some ancestor of the text node includes an xml:space="preserve" attribute, and no closer ancestor has an xml:space="default" attribute.

Since none of those conditions are true for the whitespace-only text nodes which you've supplied in your code fragment, none of that extra whitespace should be passed to the result tree. You didn't say which XSLT processor you're using. I tested the style sheet against the following source tree:

   <blocks>
      <welcometitle>Welcome Title</welcometitle>
      <blocktitle>Block Title</blocktitle>
      <acttitle>Act Title</acttitle>
      <resourcetitle>Resource Title</resourcetitle>
   </blocks>

Both the Saxon and XT XSLT 1.0 processors indeed strip all the extraneous whitespace from the style sheet's xsl:template element. From one perspective -- the fact that your result tree is XHTML -- the question is academic, since the effect as browsed is identical, whether the newlines are present or not. From a strict perspective, though, something indeed seems off-kilter with the result you're experiencing. My only advice would be to try a different XSLT processor (assuming that's an option).

One other note: you may believe that, by specifying indent="no" in your xsl:output element, you've instructed the XSLT processor to suppress all extra newlines and whitespace. Not so! First, the indent attribute is used (when its value is "yes") to direct the processor to supply extra white space in the result, for "pretty-printing" or similar purposes. The default value is "no," so all you've done here is to affirm the default.

But there's another, maybe more important consideration as well. The XSLT spec breaks down its various features into one of two categories: required and optional. For an XSLT processor to be considered compliant with the standard, it must support all required features... and may support whatever optional ones it likes. As it happens, the indent attribute to the xsl:output element is an optional feature. So even if you do want to use the indent attribute, you must be sure to use a processor which supports it.

[Thanks to Jason Diamond for his input on this question!]

Q: What does this XML mean?

What are "qualifiers" which I see in some of our XML documents and how can we relate it to an Oracle table column? For example, I've seen this:

<OPERAMT qualifier="UNIT" type="T">
<OPERAMT qualifier="EXTENDED" type="T">

Even that type="T" doesn't make sense....

A: Actually, your question isn't about XML as such. There are no "qualifiers" in XML documents, no OPERAMTs either. XML is just a general-purpose set of rules for defining special-purpose markup languages. 

So no, your question is really about a specific XML vocabulary. I don't usually tackle vocabulary-specific questions in this space, but I did do a little research on this one. Apparently the XML vocabulary you're working with is (a variant of?) Enterprise Business XML, or ebXML, for exchanging e-business-related XML-based messages such as sales orders. There's almost too much information about ebXML in general (consult the master site to get a sense of what I mean), but I did locate one on-line resource (101KB PDF; requires Adobe Acrobat Reader) which might be helpful for the immediate question. Once in that document, do a search on OPERAMT to learn the functions of that element and its two attributes.

So there you'll have the answer, sort of , to your question. But I'm afraid it won't be the answer you're looking for, which is how to map it to an Oracle table's columns. To answer that, you've first got to know which database table (which will depend on your specific application, of course). And how to get the data from one form (XML or Oracle) to the other has little to do with what the qualifier or type attributes "mean."

Oracle is a member of the industry consortium supporting ebXML, so you might want to start by consulting the Oracle Web site. For more general information about relating XML documents to database tables, check Ron Bourret's excellent "XML and databases" site.

Q: E-mail links in XML format?

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

I have been building a document to handle support calls from various companies. I have created the document that allows the call to be routed to the IT rep within the respective companies. I now wish to create a link for the rep to e-mail the call to the parent company's IT department. How would I create this link in an XML format?

A: On the face of it, this question and the one preceding it are unrelated. But they both stem from a similar misunderstanding -- that data in an XML document "means" anything outside of the context of an application designed to process it. Let me answer the question simply first, and then come back to look at some deeper implications.

Establishing an e-mail link in XML assumes that you're using some kind of software, like a Web browser or e-mail reader, which is smart enough to recognize an e-mail address as an e-mail address. Given that condition, you can choose to put the e-mail address in either an element or an attribute value. For instance (respectively):

<parent_email dept="IT">it@example.com</parent_email>

or:

<parent_co dept="IT" email="it@example.com" />

As you can see, there's no real magic here, no technical wizardry; both of those examples contain an e-mail address in an XML format. The important thing is this: There's no such thing as an "e-mail link" in an XML document. Until and unless you identify a target application which "knows" e-mail, what are to our eyes obviously e-mail addresses remain plain old dumb data. The meaning of a bit of XML code is not inherent in the code; it comes about only by way of human or software interpretation of the code.

Assume your target application is a Web browser. Now you can process the first code fragment above with an XSLT style sheet to do something like the following:

<xsl:template match="parent_email">
   E-mail the <a href="mailto:{.}"><xsl:value-of select="@dept"/> Department</a>
</xsl:template>

Or, for processing the second code fragment:

<xsl:template match="parent_co">
   E-mail the <a href="mailto:{@email}"><xsl:value-of select="@dept"/> Department</a>
</xsl:template>

This creates in the XHTML result tree, for each occurrence of a source-tree element whose name matches the value of the xsl:template element's match attribute, an a element with an href attribute. For the sample above, the corresponding portion of the result tree will look like this:

E-mail the <a href="mailto:it@example.com">IT Department</a>

Which, of course, displays (and otherwise behaves) just fine in a Web browser.

Again, though, the point is that to put an e-mail address (or a lobster bisque recipe, or a Valentine's-Day love letter, or anything else) in an "XML format" doesn't do anything on its own. The data will lie there, inert, until a fellow human being -- or a software application -- comes along and recognizes it.