Making Links, Breaking Entities

February 27, 2002

John E. Simpson

Q: Making a link

I'm trying to transform this XML

   <link value="business.html" anchor="#top">Businesses - Click Here</link>

into the obvious HTML:

   <a href="business.html#top">Click Here</a>

I can get the anchor wrapped around the "Click Here" text well enough and create the href attribute with empty double quotes, using the xsl:attribute-set element. But I don't know how to transform the XML attributes to create the URL from the source XML without simply hard-coding it over and over again.

A: As you probably suspect, you're pretty close to the answer already. Here's one solution, using the xsl:attribute element:

   <xsl:template match="link">
      <a><xsl:attribute name="href">
         <xsl:value-of select="@value/>#<xsl:value-of select="@anchor"/>
      </xsl:attribute>Click Here</a>

And here's another, using attribute value templates (or AVTs, in boldface):

   <xsl:template match="link">
      <a href="{@value}#{@anchor}">Click Here</a>

(A reminder: an AVT, coded as an XPath expression enclosed in "curly braces" -- { and } characters -- takes content directly from the source tree, particularly attribute values, and plugs it directly into the result without the somewhat clunkier intermediate step of using xsl:attribute, etc.)

Of the two approaches, I prefer the latter. For one thing, it's more concise (though arguably more cryptic); it also avoids potential problems with some XSLT processors, having to do with the embedding of whitespace -- particularly newlines -- within an attribute value. (You could also avoid these problems by removing the newlines from the xsl:attribute element above, at the expense of readability.)

I'm not sure what led you to believe you needed the xsl:attribute-set element. That's useful when you need to establish a group of attributes which will be used repeatedly throughout your XSLT style sheet:

   <xsl:attribute-set name="xlink_stuff">
      <xsl:attribute name="xlink:type">simple</xsl:attribute>
      <xsl:attribute name="xlink:title">A simple link</xsl:attribute>

Then, when you need to "clone" this group of attributes for a given element in the result tree, just use the use-attribute-set attribute to the xsl:element element:

   <xsl:element name="a" use-attribute-sets="xlink_stuff" href="mylink.html" />

This creates an a element with the href attribute as indicated, but also with the two attributes (xlink:type and xlink:title) whose values are hard-wired by way of the xsl:attribute-set element. As you found, it's not easy to create an xsl:attribute-set element with content which varies from one portion of the result tree to another. (In fact, it's impossible: xsl:attribute-set is a top-level XSLT element. Among other things, this means that its contents can't use source-tree content, such as -- in this case -- your link element.)

As an aside, the code you posted on the forum used all-uppercase HTML element names. I strongly advise getting used to all-lowercase instead (which is why I changed it in the code samples above); this will help, in a small way, to shepherd you into the brave new world of XHTML.

Q: Declaring entities with XML Schema?

I was trying to find a way to declare entities in a W3C [XML] Schema (&, <, but also ê, ë etc) so they can be implemented in the final XML document, and then transformed with an XSLT style sheet to the correct HTML equivalents. Now of course I can declare one or multiple elements in my XSD, but there must be another way. (John here showed me, but with DTD's, and I don't want to use them.)

A: Welcome to an ugly truth about the W3C's XML Schema language, freely (or otherwise) conceded by supporters as well as detractors. There is currently no way to declare entities with XML Schema.

The first time most people encounter this reality, it comes as something of a shock. Isn't XML Schema supposed to be "DTDs on steroids"? Well, yes. XML Schema can do loads of things which DTDs cannot. But DTDs can do one thing which XML Schema can't even touch, namely, declare entities.

The reasons for this are pretty straightforward if you think about entities and entity references in the right way. A general or character entity reference of the kind you're asking about -- like &amp; for the ampersand, or &#169; for the copyright symbol, or &ora; for the string "O'Reilly and Associates" -- is simply a convenience. It's a way of representing in an XML document something which otherwise has special meaning to an XML parser (like the ampersand), or is unavailable on an input device (e.g., a keyboard), or in the chosen character encoding for the document (like the copyright symbol), or is simply too verbose and difficult to maintain in its fully expanded form. Thus, an entity reference works something like a word-processor macro -- a hot key, if you will, which has effects (that is, behaves) in certain predefined ways. And as with a word-processor macro, the entity reference's behavior exists on a conceptual plane outside the scope of the document's contents. 

Note that I'm talking here about entity references, not the text (in this case) to which they refer. This substitution text is very much a part of the document's content, but it "exists" -- and hence can be subject to a schema's structural constraints, or even a DTD's content model -- only after the substitution has taken place. The entity reference exists in the document prior to parsing and substitution; following parsing and substitution, the reference has, as it were, simply evaporated.

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

Take a look at the XML Recommendation, Sections 3 (Logical Structures) and 4 (Physical Structures). Notice anything unusual? All the familiar pieces of content models and attributes -- those things which both DTDs and XML Schema address -- fall into the category of logical structures. The physical-structures section is devoted entirely to entity and notation declarations: that is, syntactic or lexical constructs which in themselves do not comprise logical chunks of an XML document and are hence invisible to XML Schema processing.

By the way, an XSD is itself an XML document, of course, so there's nothing preventing you from using entities within the Schema itself. (This is a little perverse, requiring the Schema to use a DTD to declare those entities.) You just can't use XML Schema to declare entities for use in other documents. (Appendix C, "Using Entities," of the XML Schema "Part 0: Primer" Recommendation, describes what seems to me a twisted approximation of entity declaration using XML Schema. Feel free to refer to this Frankenstein's monster of a "solution" if you're determined to go the Schema route but still want to declare "entities" -- really, in this case, just elements with fixed content.)

It may be small consolation, but consider this: as bad as your problem (wanting to construct character entities) seems, the poor folks who want to use more exotic kinds of entities (as with notations) are really cast adrift by XML Schema. It will be tedious for you to include the literal substitution text in your Schema-validated documents, instead of entity references. But at least you can get that substitution text into the final document somehow. There's no counterpart in XML Schema at all for referring to non-XML content like notations.