XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XPathScript: An Alternative To XSLT

July 05, 2000

Introduction

Table of Contents

The Syntax
The XPathScript API
Declarative Templates
A Complete Example
Stepping Through
  The Example
The Template Hash
The "testcode" Option
Copying Styles
Conclusion

XPathScript is a stylesheet language for transforming XML documents into other formats. It has only a few features, but by combining those features with the power and flexibility of Perl, XPathScript is a very capable system. Like all XML stylesheet languages, including XSLT, an XPathScript style sheet is always executed in the context of a source XML file. In many cases, the source XML file will actually define which style sheets to use via the <?xml-stylesheet?> processing instruction.

XPathScript was conceived as part of AxKit--an application server environment for Apache servers running mod_perl (see my Introduction to AxKit article). XPathScript's primary goal was to achieve the kind of transformations that XSLT can do, without being restricted by XSLT's XML based syntax, and to provide full programming facilities within that environment. I also wanted it to be completely agnostic about output formats, without having to program in special after-effect filters. The result is a language for server-side transformation that provides the power and flexibility of XSLT, combined with the full capabilities of the Perl language, and the ability to produce style sheets in any ASP-capable or ordinary text editor. The above Introduction to AxKit is recommended reading before continuing on with this article.

The Syntax

XPathScript follows the basic ASP syntax of introducing code with the <% %> delimiters. Here's a brief example of a fully compatible XPathScript style sheet:

<html>
 <body>
  <%= 5+5 %>
 </body>
</html> 

This simply outputs the value 10 in an HTML document. The delimiters used here are the <%= %> delimiters, which are slightly different in that they send the results of the expression to the browser (or to the next processing stage in AxKit). This example does absolutely nothing with the source XML file, which is completely separate from this style sheet. Here's another example:

<html>
 <body>
  <% $foo = 'World' %>
Hello
  <%= $foo %> !!!
 </body>
</html>

This outputs the text "Hello World !!!". Again, we're not actually doing anything here with our source document, so all XML files using this style sheet will look identical. This seems rather uninteresting, until we discover the library of functions that are accessible to our XPathScript style sheets for accessing the source document contents.

The XPathScript API

Along with the code delimiters, XPathScript provides stylesheet developers with a full API for accessing and transforming the source XML file. This API can be used in conjunction with the delimiters above to provide a stylesheet language that is as powerful as XSLT, and yet provides all the features of a full programming language (in this case, Perl, but I'm certain that other implementations such as Python or Java would be possible).

Extracting Values

A simple example to get us started is to use the API to bring in the title from a DocBook article. A DocBook article title looks like this:

<article>
 <artheader>
  <title>XPathScript: An Alternative To XSLT</title>
  ...

The XPath expression to retrieve the text in the title element is

/article/artheader/title/text()

To make this text into the HTML title, we need the following XPathScript style sheet:

<html>
 <head>
  <title><%= findvalue("/article/artheader/title/text()") %></title>
 </head>
 <body>
  This was a DocBook Article. We're only extracting the title for now!
  <p>
  The title was: <%= findvalue("/article/artheader/title/text()") %>
 </body>
</html>

The syntax we are using to find the document node we wanted is XPath. XPath is a W3C Recommendation for finding and matching XML document nodes. The specification is fairly readable and is at http://www.w3.org/TR/xpath. Alternatively I can recommend Norm Walsh's XPath introduction, which covers a slightly older version of the specification, but I didn't notice anything in the article that is missing or different from the current recommendation.

Extracting Nodes

The above example showed us how to extract single values, but what if we wish to extract a list of values? Here's how we might get a table of contents from DocBook article sections:

...
<%
for my $sect1 (findnodes("/article/sect1")) {
 print findvalue("title/text()", $sect1), "<br>\n";
 for my $sect2 (findnodes("sect2", $sect1)) {
  print " + ", findvalue("title/text(), $sect2), "<br>\n";
  for my $sect3 (findnodes("sect3", $sect2)) {
   print " + + ", findvalue("title/text(), $sect3), "<br>\n";
  }
 }
}
%>
...

This gives us a table of contents down to three levels (adding links to the actual part of the document is left as an exercise). The first call to findnodes gives us all sect1 nodes that are children of the root element (article). The XPath expressions following that are relative to the current node. You can see that by the absence of the leading /.

Note in the above how we specify the current $sectX variable in the calls to the API. This is the context for the XPath expression, and it is vital so that we get the right values for the expression. The context in XPathScript is never set automatically. This is something that XSLT authors might miss, and expect to be done for them. This way, however, we have some added flexibility, in that you can always specify your own context, and pass context nodes around in your script.

Declarative Templates

The examples up to now have all been based around a single global template with search/replace type functionality from the source XML document. This is a powerful concept in itself, especially when combined with loops and the ability to change the context of searches. But that style of template is limited in its utility to well-structured data, rather than processing large documents. In order to ease the processing of documents, XPathScript includes a declarative template processing model too, so that you can simply specify the format for a particular element and let XPathScript do the work for you.

In order to support this method, XPathScript introduces one more API function: apply_templates(). The name is intended to appeal to people already familiar with XSLT. The apply_templates() function takes either a list of start nodes, or an XPath expression (which must result in a node set) and optional context. Starting at the start nodes, it traverses the document tree applying the templates defined by the $t hash reference.

First, a simple example to introduce this feature. Let's assume for a moment that our source XML file is valid XHTML, and we want to change all anchor links to italics. Here is the very simple XPathScript template that will do that:

<%
$t->{'a'}{pre} = '<i>';
$t->{'a'}{post} = '</i>';
$t->{'a'}{showtag} = 1;
%>
<%= apply_templates() %>

Note that apply_templates() has to be called using <%= %>. That's because apply_templates() actually returns a string representation of the transformation--it doesn't do the output to the browser for you.

The first thing this example does is set up a hash reference $t that XPathScript knows about. The keys of $t are element names (including namespace prefix, if we are using namespaces). The hash can have the following sub-keys:

  • pre
  • post
  • showtag
  • testcode

We'll cover testcode in more depth later in The Template Hash, but we'll note here that it is a place holder for code that allows for more complex templates.

Unlike XSLT's declarative transformation syntax, the keys of $t do not specify XPath match expressions. Instead they are simple element names. This is a trade-off between speed of execution and flexibility. Perl hash lookups are extremely quick compared to XPath matching. Luckily, because of the testcode option, more complex matches are quite possible with XPathScript.

The simple explanation for now is that pre specifies output to appear before the tag, post specifies output to appear after the tag, and showtag specifies that the tag itself should be output as well as the pre and post values.

Pages: 1, 2

Next Pagearrow