XSLT Surgery

April 25, 2001

John E. Simpson

Q: How do I transform an XML document that I can't edit?

I have a source XML file which I don't control. I can't edit it. But I want to display that XML file on my site using XSLT. I know I need to add an xml-stylesheet PI to a document to associate it with an XSLT stylesheet, like

<?xml-stylesheet type="text/xsl" href="test.xsl"?>

But since this XML file is maintained by somebody else, how do I link my stylesheet to it?

A: First -- this has nothing to do with XML per se and everything to do with common courtesy -- get permission to use someone else's content verbatim. If courtesy doesn't appeal to you as a motive, then think of liability. Like it or not, the Web nowadays is a place where you can get in trouble simply for linking to someone else's content, let alone cribbing it. I'd be only half surprised to learn that someone was preparing to sue someone else for just looking at the plaintiff's Web content.

Once you've gotten permission, here's one approach to solving your problem. You want to create an XML document which simply includes the targeted content. In addition, this XML document will contain your xml-stylesheet PI. So it would look something like

<?xml version="1.0"?>
<!DOCTYPE wrapper [
<!ENTITY incl_content SYSTEM "uri of included content">
<?xml-stylesheet type="text/xsl" href="text.xsl"?>

Replace uri of included content with the included document's URI.

The wrapper element is necessary because parsers will reject any document that doesn't have its own root element. (You can call this element type by some other name if you want, although wrapper seems descriptive enough.) Just remember that the stylesheet may need to take into account the fact that the document it's processing has such an element as its root element, within which appears all the content of the included document.

This will work, as long as the following conditions are true:

  1. The document to be included must, at a minimum, be well-formed XML.
  2. If the document to be included comes with its own xml-stylesheet PI, it may override your "text.xsl" stylesheet (depending on your XSLT processor).

As a final note, remember to include in "text.xsl" a template which instantiates a credit to the included page's author. Something like

<xsl:template match="/">
...other bits of templates, calls to other template rules, etc....
...other bits of templates, calls to other template rules, etc....
<h5>Above material included by permission from

its author, Joe Blow.
Copyright 2001 by Joe Blow.</h5>

Q: How can I use two different XSLT stylesheets for the same XML document?

I have a XML file with data both in Portuguese and in English, and I want a link to the English version in the index.xml file.

A: This is a variation of the first question. The trick is not to associate that Portuguese/English XML document itself with any stylesheet. Instead, relegate that association to XML documents which include the Portuguese/English document and link to the appropriate stylesheet. So your index.xml file might look something like the following (of course, substitute the correct element and attribute names for the bogus ones I'm using here):

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE index_root [
<!ENTITY atilde "&#227;" >
<link lang_page="portuguese.xml">Vers&atilde;o portuguese</link>
<link lang_page="english.xml">English version</link>

When a user opens index.xml, they'll see two links. Selecting the one which displays as "Versão portuguese" will open a document named portuguese.xml; the one displaying "English version," a document named english.xml. (I'm greatly oversimplifying the potential problem of how to do linking in XML. If you're using XHTML in the index.xml document, replace the link elements above with a elements, and the lang_page attributes with href attributes.)

The portuguese.xml and english.xml documents would be nearly identical, differing only in their xml-stylesheet PIs:

<?xml version="1.0"?>
<!DOCTYPE wrapper [
<!ENTITY incl_content SYSTEM="uri of Portuguese/English document">
<?xml-stylesheet type="text/xsl" href="uri of language-specific stylesheet"?>

Like the last question, replace uri of Portuguese/English document with your Portuguese/English data document's URI. In portuguese.xml, replace uri of language-specific stylesheet with the URI of the stylesheet for processing Portuguese-language data; in english.xml, with that for processing English-language data. As in the last question, too, don't forget that the stylesheets may need to take into account the presence of the artificial wrapper element.

Q: How do I easily define many values for variables in a multi-language lexicon?

I'm implementing a four-language lexicon with XSLT variables. For example,

<td><xsl:value-of select="$account-number"/></td>

where the value of the account-number variable depends on some other variables, as follows:

<xsl:variable name="account-number">
<xsl:when test="$is-en">Account Number</xsl:when>
<xsl:when test="$is-fr">Numero de Compte</xsl:when>
<xsl:when test="$is-he">Mispar Heshbon</xsl:when>
<xsl:when test="$is-ru">Choter</xsl:when>

As you can see, referencing the variable is convenient, but defining it is verbose. Is there an easier way?

A: If verbosity is undesirable, XSLT is definitely not the language for you. Seriously, though, you're probably stuck with what you've already settled on.

The possible good news -- depending on exactly what your source tree looks like and how much freedom you have with it -- is that the obstacle may be surmountable, with a simple change to your source tree's structure. Specifically, the obstacle seems to me to be all those secondary variables with a value of true or false: is-en, is-fr, is-he, and is-ru.

They bother me for two reasons. First, presumably they're mutually exclusive values; only one will be true at any given time. And second, if that first assumption is correct, this seems like an ideal situation in which to use the built-in xml:lang attribute. (It's "built-in" in the sense that XML 1.0 defines it; and, since all other XML-related specs devolve from that one, xml:lang should be respected by any compliant software.)

You can't use just any values for the xml:lang attribute. The spec refers to a number of other standards for information about allowable values, especially IETF RFC 1766, ISO 639, and ISO 3166. The allowable values are internationally accepted two- or three-character codes unique to each country or language. You're already well on your way to using this standard, since "en", "fr", "he", and "ru" -- which map to the names of your variables -- are all legitimate language codes; hence, allowable values for the xml:lang attribute.

(As a general rule, country codes are uppercase; language codes, lowercase.)

So you might have a source tree structure like this, using xml:lang instead of multiple two-valued attributes (one per language):

<term id="num">
<term-lang xml:lang="en">Account Number</term-lang>
<term-lang xml:lang="fr">Numero de Compte</term-lang>
<term-lang xml:lang="he">Mispar Heshbon</term-lang>
<term-lang xml:lang="ru">Choter</term-lang>
...etc. - other terms as needed...

Now you can assign the value of the account-number variable without using an xsl:choose block at all, like

<xsl:variable name="account-number"
select="//term-lang[@xml:lang=$page-lang and ../@id='num']"/>/>

Then your example result tree would look like this, given a value of "HE" for the page-lang variable:

<td>Mispar Heshbon</td>

(Of course, you could then proceed to turn the searched-for id value into a variable as well, so you didn't need to hard-code it.)

Q: I'm searching for software which automatically generates XSL. Do you know of any?

A: Assuming by "XSL" you mean XSLT, I recently looked at a product which does just that. Sort of.

It's called XSLWiz and is published by EBProvider. The general idea is that you feed it descriptions of the source document (input) and the destination (output), and then connect a point from the former to a corresponding point on the latter, using a simple drag-and-drop method. The descriptions you provide to XSLWiz are in the form of DTDs or XML Schemas; if you have neither, you can supply a document instance in its place, and XSLWiz will infer the schema from it.

(XSLWiz actually works against XML Schema files only. If you don't provide your descriptions in that form, then XSLWiz builds them from what you do provide. One implication: You can use XSLWiz just to create schema files from DTDs or document instances, without ever generating a line of XSLT.)

After going through the product tutorial to get an idea of how it worked, I tested XSLWiz using my own FlixML vocabulary as the source and the XHTML 1.0 Transitional DTD as the destination. This didn't work; even when I mapped only a few connections between the former and the latter, XSLWiz complained that there were too many.

So I scaled down my expectations a bit. I stripped both vocabularies to about a dozen elements and attributes apiece. This worked fine. So somewhere between the two extremes is where XSLWiz's limits lie. (I couldn't find a mention of any such limit on the EBProvider site.)

The XSLT code that XSLWiz generated was rather idiosyncratic. For instance, the entire result tree was instantiated by a single template rule (xsl:template element). I'm reasonably certain that you'd need to hand-tweak the generated XSLT, if for no other reason than performance.

Although apparently written in Java, XSLWiz runs only on Microsoft Windows platforms. The product's stated requirements do not include the Internet Explorer browser, nor even the MSXML XML/XSLT processor, so I'm not sure what the Windows dependencies are. A free, seven-day timed evaluation version is available for download; purchase price was a rather hefty, given the product's apparent limitations, $995 per license.

More from XML Q&A