Finding IDs

June 25, 2003

Before getting into this month's questions, I wanted to discuss a change in this column's standard operating procedures. In the nearly three years that I've been writing "XML Q&A," all questions I've answered have been drawn from a single source: the O'Reilly Network XML Forum. (Annual exceptions -- the August columns -- have been the "Nobody Asked Me, But..." pieces, based on questions that no one in particular asked but that I wanted to tackle anyway.) In May, the XML Forum was discontinued by O'Reilly; its subject matter, after all, overlapped with numerous other online resources.

Starting this month, I'll be perusing those other online resources for questions to answer here. Included are the following mailing lists and newsgroups, in addition to others (links are to archives, subscription pages, or the Google Groups root for the given resource):

In all cases, I'll focus on questions which haven't yet been answered and continue to focus on questions of broad interest.

Now on to this month's items.

Q: XPath to IDs?

I want something like this:

<!-- a.xml -->

<a>

  <elt id="a" value="1"/>

  <elt id="b" value="2"/>

  <elt id="c" value="3"/>

</a>



<!-- a.xsl -->

<xsl:stylesheet 

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">

    <output>

      <xsl:value-of

select="#b/@value"/>

    </output>

  </xsl:template>

</xsl:stylesheet>

and to get the output:

<output>2</output>

Is there a syntax for this in XPath 1 or one being considered for XPath 2.0?

A: First, I assume you know that simply naming an attribute "id" doesn't make it an ID-type attribute. The only way to ensure that an attribute is of the ID type is to declare it so, via either DTD or XML Schema. So most of my answer takes it for granted that the id attributes in your document are formally declared as such.

Just to highlight the portion of your stylesheet which you're proposing will locate the correct element:

  <xsl:value-of select="#b/@value"/>

It's not quite that easy, but it's not much harder, either. (Aside from its simplicity, though, the above is syntactically incorrect. An XPath-aware processor, such as an XSLT engine, will complain about the # character.) Instead of a simple "named anchor"-style selection, use the id() function to locate the element in question. It takes one argument, the ID value(s) you're looking for. Replace your xsl:value-of element with this one:

  <xsl:value-of select="id('b')/@value"/>

And what if the id attributes are undeclared? You can still locate the right element (assuming no two elements share the same id value) with:

<xsl:value-of select="descendant::*[@id='b']/@value"/>

As an aside, note that you needn't pass the id() function just a single value. You can pass it multiple values in a whitespace-delimited list, as here:

  <xsl:value-of select="id('b c')/@value"/>

This locates the first element matching either of the two ID values. Furthermore, the argument needn't be a string. If it's a number or Boolean, the argument will be converted to a string. This behavior is consistent with that of other XPath functions. But, and this is the interesting part, if the argument is a node-set, the id() function behaves quite differently. Rather than returning a single node, it returns a node-set containing all element nodes whose ID-type attributes match any of the string-values of nodes in the passed node-set. Thus the id() function can actually locate more than one node, which seems to be a contradiction.

The notion is hard to visualize with the sample document the questioner has provided, since there's no correspondence between any string-values in the document and the values of the id attributes.

But consider a common scenario: You control your own XML vocabulary, but not some other XML-based resource whose contents you want to use to locate ID-based values in one of your own documents. For instance, say you've got a document listing book titles (call it, say, books_details.xml):

<books_details>

  <book isbn="b0684833395">Catch-22</book>

  <book isbn="b0440180295">Slaughterhouse 5</book>

  <book isbn="b0764547771">XML: A Primer</book>

  <book isbn="b0446670251">The Virgin 

Suicides</book>

  <book isbn="b0440215625">Dragonfly in Amber</book>

  <book isbn="b088184800X">Crossed Wires</book>

  <book isbn="b0679736379">Sophie's Choice</book>

  <book isbn="b0596002521">XML Schema</book>

</books_details>

(Note that isbn is declared as an ID-type attribute. Also note a side-effect of this declaration: the attribute's value may not start with a digit.)

Elsewhere, in some other document, you have a list of books arranged by subject (as they might be shelved in a bookstore, for example). This document (books_shelves.xml) might look something like this:

<books_details>

  <category shelf="fiction">

    <isbn>b0684833395</isbn>

    <isbn>b0440180295</isbn>

    <isbn>b0446670251</isbn>

    <isbn>b0679736379</isbn>

  </category>

  <category shelf="tech">

    <isbn>b0764547771</isbn>

    <isbn>b0596002521</isbn>

  </category>

  <category shelf="romance">

    <isbn>b0440215625</isbn>

  </category>

  <category shelf="mystery">

    <isbn>b088184800X</isbn>

  </category>

</books_details>

Obviously, if you controlled both of these vocabularies, a simple solution would be to merge the two documents into one. But if you can't do so, for any of a thousand reasons, you can still use the second document to locate in the first all books which are shelved as, say, fiction. A stylesheet template to achieve this, by transforming books_details.xml, might look like the following:

<xsl:template match="/">

  <xsl:for-each select="id(document('books_shelves.xml')//isbn

[../@shelf='fiction'])">

    <output>

      <xsl:value-of select="."/>

    </output>

  </xsl:for-each>

</xsl:template>

The operative portion of this template -- the portion highlighted in boldface -- uses the id() function, in conjunction with the document() function, to locate multiple nodes in the first document (books_details.xml ) based on the string-values of nodes in the second (books_shelves.xml). Translated into English, the value of the xsl:for-each element's select attribute might read something like this:

The inner call to the document() function locates, in books_shelves.xml, a node-set consisting of all isbn elements whose parents have a shelf attribute with a value of "fiction."
The outer call to id() locates, in books_details.xml, each element with an ID-type attribute equal to the string-value of one of the nodes in the node-set located in the preceding step.

The result tree from this transformation is:

<output>Catch-22</output>

<output>Slaughterhouse 5</output>

<output>The Virgin Suicides</output>

<output>Sophie's Choice</output>

By the way, note that this result tree isn't well-formed on its own, consisting as it does of more than one root element.

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

There's more than one way to obtain these results. Instead of using the id () function, for instance, you could use keys to locate the desired nodes. (This is absolutely the way to go if the attributes in question aren't ID-type attributes in the first place. See Bob DuCharme's "Declaring Keys and Performing Lookups" here on XML.com for more details.) Still, if you've got ID-type attributes you might as well take advantage of their uniqueness.

Follow-up: XML-based résumés

In last month's column, I reported on the XML Résumé Library for capturing curriculum vitae information. Shortly after that column appeared, I was contacted by Aaron Straup Cope, who has taken it upon himself to extend the XML Résumé Library with some (IMO) notable improvements.

At a minimum, Cope's extensions add to the XML Résumé Library's DTD a new element, activities, and several offspring elements. The activities element, says Cope, identifies "personal, or group, projects that are not directly 'work' related." For instance, you could include memberships in civic organizations under this category. Much more interesting is the set of stylesheets which Cope has prepared; these provide you with the ability to exclude certain information (address and phone number, for example) from the output, to define more than one CSS stylesheet depending on output device, and so on.

If you found the XML Résumé Library interesting, by all means head over to Cope's aaronlind.info XSLT tools page.