XPathScript: An Alternative To XSLT
Introduction
| Table of Contents |
|
The Syntax |
XPathScript is a stylesheet language for transforming XML documents into
other formats. It has only a few features, but by combining those
features with the power and flexibility of Perl, XPathScript is a very
capable system. Like all XML stylesheet languages, including XSLT, an
XPathScript style sheet is always executed in the context of a source XML
file. In many cases, the source XML file will actually define which
style sheets to use via the <?xml-stylesheet?>
processing instruction.
XPathScript was conceived as part of AxKit--an application server environment for Apache servers running mod_perl (see my Introduction to AxKit article). XPathScript's primary goal was to achieve the kind of transformations that XSLT can do, without being restricted by XSLT's XML based syntax, and to provide full programming facilities within that environment. I also wanted it to be completely agnostic about output formats, without having to program in special after-effect filters. The result is a language for server-side transformation that provides the power and flexibility of XSLT, combined with the full capabilities of the Perl language, and the ability to produce style sheets in any ASP-capable or ordinary text editor. The above Introduction to AxKit is recommended reading before continuing on with this article.
The Syntax
XPathScript follows the basic ASP syntax of introducing code with the
<% %> delimiters. Here's a brief example of a
fully compatible XPathScript style sheet:
<html> <body> <%= 5+5 %> </body> </html> |
This simply outputs the value 10 in an HTML document. The delimiters used
here are the <%= %> delimiters, which are slightly
different in that they send the results of the expression to the browser
(or to the next processing stage in AxKit).
This example does absolutely nothing with the source XML file,
which is completely separate from this style sheet. Here's another
example:
<html> <body> <% $foo = 'World' %> Hello <%= $foo %> !!! </body> </html> |
This outputs the text "Hello World !!!". Again, we're
not actually doing anything here with our source document, so all XML
files using this style sheet will look identical. This seems rather
uninteresting, until we discover the library of functions that are
accessible to our XPathScript style sheets for accessing the source
document contents.
The XPathScript API
Along with the code delimiters, XPathScript provides stylesheet developers with a full API for accessing and transforming the source XML file. This API can be used in conjunction with the delimiters above to provide a stylesheet language that is as powerful as XSLT, and yet provides all the features of a full programming language (in this case, Perl, but I'm certain that other implementations such as Python or Java would be possible).
A simple example to get us started is to use the API to bring in the title from a DocBook article. A DocBook article title looks like this:
<article> <artheader> <title>XPathScript: An Alternative To XSLT</title> ... |
The XPath expression to retrieve the text in the title element is
/article/artheader/title/text() |
To make this text into the HTML title, we need the following XPathScript style sheet:
<html>
<head>
<title><%= findvalue("/article/artheader/title/text()") %></title>
</head>
<body>
This was a DocBook Article. We're only extracting the title for now!
<p>
The title was: <%= findvalue("/article/artheader/title/text()") %>
</body>
</html>
|
The syntax we are using to find the document node we wanted is XPath. XPath is a W3C Recommendation for finding and matching XML document nodes. The specification is fairly readable and is at http://www.w3.org/TR/xpath. Alternatively I can recommend Norm Walsh's XPath introduction, which covers a slightly older version of the specification, but I didn't notice anything in the article that is missing or different from the current recommendation.
The above example showed us how to extract single values, but what if we wish to extract a list of values? Here's how we might get a table of contents from DocBook article sections:
...
<%
for my $sect1 (findnodes("/article/sect1")) {
print findvalue("title/text()", $sect1), "<br>\n";
for my $sect2 (findnodes("sect2", $sect1)) {
print " + ", findvalue("title/text(), $sect2), "<br>\n";
for my $sect3 (findnodes("sect3", $sect2)) {
print " + + ", findvalue("title/text(), $sect3), "<br>\n";
}
}
}
%>
...
|
This gives us a table of contents down to three levels (adding links to
the actual part of the document is left as an exercise). The first call
to findnodes gives us all sect1 nodes that are children of the root
element (article). The XPath expressions following that are
relative to the current node. You can see that by the absence of the leading
/.
Note in the above how we specify the current
$sectX variable in the calls to
the API. This is the context for the XPath expression, and it is vital so
that we get the right values for the expression. The context in
XPathScript is never set automatically. This is something that XSLT
authors might miss, and expect to be done for them. This way, however,
we have some added flexibility, in that you can always specify your
own context, and pass context nodes around in your script.
Declarative Templates
The examples up to now have all been based around a single global template with search/replace type functionality from the source XML document. This is a powerful concept in itself, especially when combined with loops and the ability to change the context of searches. But that style of template is limited in its utility to well-structured data, rather than processing large documents. In order to ease the processing of documents, XPathScript includes a declarative template processing model too, so that you can simply specify the format for a particular element and let XPathScript do the work for you.
In order to support this method, XPathScript introduces one more API
function: apply_templates(). The name is intended to
appeal to people already familiar with XSLT. The
apply_templates() function takes either a list of
start nodes, or an XPath expression (which must result in a node set) and
optional context. Starting at the start nodes, it traverses the document
tree applying the templates defined by the $t hash
reference.
First, a simple example to introduce this feature. Let's assume for a moment that our source XML file is valid XHTML, and we want to change all anchor links to italics. Here is the very simple XPathScript template that will do that:
<%
$t->{'a'}{pre} = '<i>';
$t->{'a'}{post} = '</i>';
$t->{'a'}{showtag} = 1;
%>
<%= apply_templates() %>
|
Note that apply_templates() has to be called using
<%= %>. That's because
apply_templates() actually returns a string
representation of the transformation--it doesn't do the output to the
browser for you.
The first thing this example does is set up a hash reference
$t that XPathScript knows about. The keys of $t are element names (including
namespace prefix, if we are using namespaces). The hash can have the
following sub-keys:
prepostshowtagtestcode
We'll cover testcode in more depth later in The Template Hash, but we'll note here that it is a place holder
for code that allows for more complex templates.
Unlike XSLT's declarative transformation syntax, the keys of
$t do not specify XPath match
expressions. Instead they are simple element names. This is a trade-off
between speed of execution and flexibility. Perl hash lookups are extremely
quick compared to XPath matching. Luckily, because of the
testcode option, more complex matches are quite
possible with XPathScript.
The simple explanation for now is that pre specifies
output to appear before the tag, post specifies
output to appear after the tag, and showtag specifies
that the tag itself should be output as well as the pre
and post values.
Pages: 1, 2 |