From One String to Many
This month, I found two questions from two different people on essentially the same subject:
Q: Can I use XSLT to parse a string of characters?
Question 1: Right now I'm sending values to our XSLT
stylesheet which is defining some xsl:params we
have. Here's the way the command line is typically structured:
foo.xsl -param Name "'Chuck'"
And here's the XSLT to acquire the value of the Name parameter:
<xsl:param name="Name" select="''" />
Is there a way to send any number of multiple values
for Name and have them define, say, the XSLT equivalent
of an array? I was thinking of using something like this variation on
the command line above, with multiple names delimited by
semicolons:
foo.xsl -param Name "'Chuck;Steve;Sara;Jane;'"
Question 2: I have an element like this:
<para one-of="|11|22|33|44|55|"/>
I want to get the value list of its attribute one-of,
but I don't know how to parse it,
A: Both of these questions ask of XSLT that it perform what might
be called a sidebar task; that is, something apart from its central
mission, which is the manipulation of a source tree. They both want
to use XSLT to examine a text string, breaking it apart into
substrings at locations defined by some delimiting character. In
Question 1, the values passed in by way of the Name
parameter are delimited by semicolons; in Question 2, the values
contained in the one-of attribute are delimited by
pipes ("vertical bars").
The choice of delimiting character is completely arbitrary, although it may be forced on you by some outside constraint such as the interface with another application. You might just as well choose hyphens, underscores, even slashes or backslashes; e.g., for parsing the path of a file in a local directory or a URL. The separator might be just a blank space, for that matter, enabling you to extract words from a sentence. Or use periods to extract sentences from a paragraph. And so on.
There are some minor differences in the two questions:
I don't think this second issue matters much. At worst, you could simply trim off the opening delimiter and then process the rest of the string exactly as when dealing with Question 1's more conventional list form. It's an atypical way to delimit a list of items, to be sure. But it does highlight the need to know your data, as always.
Enough talk about the details. What's the answer?
When XPath finally makes the leap to its second version, dragging
in its wake new versions of both XQuery and XSLT, you should be able
to take advantage of a new string
function called tokenize(). This function takes at
least two arguments: the first is the string to be broken up, and the
second is the delimiter character(s) which mark the boundaries between
adjacent tokens. For instance, to handle the first
question's Name parameter and its semi-colon delimiters,
a call to this function might look as follows:
tokenize($Name, ";")
You might be curious what exactly the tokenize()
function returns at the point of the call. What it returns is an XML
Schema sequence: that is, a series of discrete values. While you can't
do much with this sequence by itself, you can use the likewise new
XPath 2.0 functions item-at() and index-of()
to process it, including extracting or enumerating the individual
values.
See Bob DuCharme's "Transforming XML" column of May, 2003, "XSLT 2 and Delimited Lists," for more information and some examples of these new features of XPath 2.0.
Right about now, you may be thinking to yourself something along these lines: having XPath 2.0 on the horizon is all well and good, but what about now? After all, there aren't many XPath/XSLT processors today capable of handling XPath 2.0 novelties (however useful) -- Saxon being the notable exception.)
Another alternative to consider is the EXSLT extension
function str:tokenize().
EXSLT is, as its home page states plainly, "a community initiative to provide extensions to XSLT." These extensions are of three kinds: named templates, extension functions, and extension elements.
Consider the EXSLT extension functions category. These work like
other functions you might be familiar with from XPath/XSLT 1.0, such
as name(),
count(), translate(), document(),
and key(). They do, however, require you to take a few
extra steps: declaring EXSLT-specific namespaces and importing EXSLT
stylesheets into your own. The exact steps to take depend on the
function(s) you're interested in using, and the XSLT processor in your
environment.
EXSLT offers several
implementations of a tokenization routine, including a JavaScript
version and a named template as well as processor-specific
functions. The EXSLT str:tokenize() function (note the
namespace prefix), like XPath 2.0's version, takes two arguments (or
parameters, if you're using a template-based solution such as
Jeni Tennison's); the first is the string to be tokenized, the
second is the delimiter. What it returns to you at the point of the
function call, though, isn't anything exotic like an XML Schema
sequence, requiring that your XSLT processor include support not only
for XPath 2.0 but for XML Schema as well. What it returns to your
stylesheet is a simple node-set, consisting of N
token elements, the value of each of which is a token
extracted from the first argument. (If you're using a template-based
call to str:tokenize(), what you get back is a result
tree fragment, or RTF, rather than a true node-set.)
|
Also in XML Q&A | |
For instance, to handle the first questioner's situation with the
EXSLT str:tokenize function, your stylesheet would
include (in addition to any requisite namespace
declarations, xsl:import elements, and so on, depending
on the version of the function you're using) a call like this:
str:tokenize($Name, ";")
If you're using the template-based str:tokenize, the
call would look like this:
<xsl:call-template name="str:tokenize">
<xsl:with-param name="string" select="$Name" />
<xsl:with-param name="delimiters" select="';'"
/>?
</xsl:call-template>
Note that the double-quoting necessary in the value of the
second select attribute; this ensures the XSLT processor
will treat the value as a string, rather than as an XPath
expression.
What you'd get back in either case would be a node-set (or RTF) like this:
<token>Chuck</token>
<token>Steve</token>
<token>Sarah</token>
<token>Jane</token>
Such a node-set/RTF, of course, can be processed by any old XSLT processor.
By the way, don't be shy about appropriating EXSLT functions and named templates for your own use if they're not exactly what you need; simply download the code and modify it to your own purposes. Give credit where credit is due, though: include a reference in your code's documentation to the work of the EXSLT folks.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.