Menu

XPathScript, part 2: A Complete Example

July 5, 2000

Matt Sergeant

Now let's put all these ideas together into an (almost) complete example. This is part of the style sheet I use to process my DocBook articles online:


<!--#include file="docbook_tags.xps"-->

<%



my %links;

my $linkid = 0;

$t->{'ulink'}{testcode} = sub { 

  my $node = shift;

  my $t = shift;

  my $url = findvalue('@url', $node);

  if (!exists $links{$url}) {

   $linkid++;

   $links{$url} = $linkid;

  }

  my $link_number = $links{$url};

  $t->{pre} = "<i><a href=\"$url\">";

  $t->{post} = " [$link_number]</a></i>";

  return 1;

 };



%>

<html>

<head>

 <title><%= findvalue('/article/artheader/title/text()') %></title>

</head>

<body bgcolor="white">



<%

# display title/TOC page

print apply_templates('/article/artheader/*');

%>



<hr>



<%

# display particular page

foreach my $section (findnodes("/article/sect1")) {

 print apply_templates($section);

}

%>



<h1>List of Links</h1>

<table border="1">

<th>URL</th>

<%

for my $link (sort {$links{$a} <=> $links{$b}} keys %links) {

%>

<tr>

<td><%= "[$links{$link}] $link" %></td>

</tr>

<% } %>

</table>



</body>

</html>

The first line imports a library of tags that are shared between this style sheet and one that is easier for web viewing with clickable links between sections (which can be downloaded here). The import system is based on Server Side Includes (SSI), although only SSI file includes are supported at this time (SSI virtual includes can be implemented using mod_include). Here is part of the docbook_tags.xps file:


<%



$t->{'attribution'}{pre} = "<i>";

$t->{'attribution'}{post} = "</i><br>\n";



$t->{'para'}{pre} = '<p>';

$t->{'para'}{post} = '</p>';



$t->{'ulink'}{testcode} = sub { 

  my $node = shift;

  my $t = shift;

  $t->{pre} = "<i><a href=\"" .

      findvalue('./@url', $node) . "\">";

  $t->{post} = '</a></i>';

  return 1;

 };



$t->{'title'}{testcode} = sub { 

  my $node = shift;

  my $t = shift;

  if (findvalue('parent::blockquote', $node)) {

   $t->{pre} = "<b>";

   $t->{post} = "</b><br>\n";

  }

  elsif (findvalue('parent::artheader', $node)) {

   $t->{pre} = "<h1>";

   $t->{post} = "</h1>";

  }

  else {

   my $parent = findvalue('name(..)', $node);

   if (my ($level) = $parent =~ m/sect(\d+)$/) {

    $t->{pre} = "<h$level>";

    $t->{post} = "</h$level>";

   }

  }



  return 1;

 };



%>

Stepping Through The Example

Careful readers will note that the first thing we see is a $t specification for <ulink> tags, and that the included docbook_tags.xps file also contains a specification for <ulink>. This is to override the default behavior for <ulink tags in the print version of my articles, in order to contain a reference that we can use later in a list of links. We can also see that this specification uses a testcode parameter that we haven't encountered before. We'll see how and why that's used later in The Template Hash.

Next, we see the findvalue() function used exactly as we saw above in Extracting Values. Then we have a section with a comment marked "display Title/TOC page." This uses the apply_templates() function with an XPath expression. Note that rather than use the <%= %> delimiters around the apply_templates() call, we simply use the print function. This has the same effect, and is used here to show the flexibility in this approach.

The main part of the code loops through all sect1 tags, and calls apply_templates() on those nodes. Note how this is another demonstration of Perl's TMTOWTDI (There's More Than One Way To Do It) approach--the same code could have been written as follows:


<%= apply_templates("/article/sect1") %>

Finally, because this is the print version of our article, we provide a list of links so that people viewing a printed version can type in those links, and so that they can also refer to the link by reference number, as we saw earlier. We use the hash of links in the %links variable that we built in the testcode handler for our ulink template.

The other file, docbook_tags.xps, is included (only in part here) to demonstrate a few of the transformations we're applying to various DocBook article tags. We can see that we're turning <para> tags into <p> tags, and doing some more complex processing with testcode to <title> tags. The next section provides more detail on what can be achieved with testcode.

The Template Hash

The apply_templates() function iterates over the nodes specified as parameters, applying the templates in the $t hash reference. This is the most important feature of XPathScript, because it allows you to define the appearance for individual tags without having to do it programmatically. This is the declarative part of XPathScript.

There is an important point to make here: XSLT is a purely declarative syntax, and people are having to work procedural code into XSLT via work arounds. XPathScript takes a much more pragmatic approach (much like Perl itself)--it is both declarative and procedural, allowing you the flexibility to use real code for real problems. It is important to note that apply_templates() returns a string, so you must either use print apply_templates('path') if using it from a Perl section of code, or via <%= apply_templates('path') %>.

The keys of $t are the names of the elements, including namespace prefixes. When you call apply_templates(), every element visited is looked up in the $t hash, and the template items stored in that hash are applied to the node. It's worth noting at this point that, unlike XSLT, XPathScript does not perform tree transformations from one tree to another. It simply sends its output to the browser directly. This has advantages and disadvantages, a discussion of which is beyond the scope of this article.

The following sub-keys define the transformation:

  • pre - the output to occur before the tag.
  • post - the output to occur after the tag.
  • prechildren - the output to occur before the children of this tag are output.
  • postchildren - the output to occur after the children of this tag are output.
  • prechild - the output to occur before each child of this tag.
  • postchild - the output to occur after each child of this tag.
  • showtag - set to a true value to display the tag as well as the pre and post values. If unset or false, the tag itself is not displayed.
  • testcode - code to execute upon visiting this tag. See below.

The showtag option is mostly equivalent to the XSLT <xsl:copy> tag, only less verbose. The pre and post options are useful, because generally in transformations we want to specify what comes before and after a tag. For example, to change an HTML A tag to be in italics but still have the link, we would use the following:


$t->{A}{pre} = "<i>";

$t->{A}{post} = "</i>";

$t->{A}{showtag} = 1;

The "testcode" Option

The testcode option is where we perform really powerful transformations. It's how we can do more complex tests on the nodes, and locally modify the transformation based on what we find.

The value stored in testcode is simply a reference to a subroutine. In Perl, these are incredibly simple to create using the anonymous sub keyword. The sub is called every time one of these elements is visited. The subroutine is passed two parameters: the node itself, and an empty hash reference that you can populate using the pre, post, prechildren, prechild, postchildren, postchild and showtag values that we've discussed already. Unlike the global $t hashref, you don't have to first specify the element name as a key. Here's the <ulink> example from the global tags code above:


$t->{'ulink'}{testcode} = sub { 

 my ($node, $t) = @_;

 $t->{pre} = '<i><a href="' . findvalue('@url', $node) . '">';

 $t->{post} = '</a></i>';

 return 1;

};

The equivalent XSLT code looks like this:


<xsl:template match="ulink">

 <i><a>

  <xsl:attribute name="href">

   <xsl:value-of select="@url"/>

  </xsl:attribute>

  <xsl:apply-templates/>

 </a></i>

</xsl:template>

Note in the XPathScript above that the inner $t is lexically scoped, so changes to it don't affect the outer $t. To save some confusion we might have named that variable $localtransforms, but some people, like me, hate typing....

The return value from the testcode subroutine is important. A return value of 1 means to process this node and continue processing all the children of this node. A return value of -1 means to process this node and stop, and a return value of 0 means do not process this node at all. This is useful in conditional tests, where you may not wish to process the nodes under certain conditions.

We can do things in XPathScript based on XPath lookups, just as we can in XSLT. While it is a little more verbose than a simple XSLT pattern match, the trade-off is in performance. An example: in XSLT you might match artheader/title and elsewhere you might match title[name(..) != "artheader"]. In XPathScript we can only match "title" in the template hash. But we can use the testcode section to extend the match:


$t->{'title'}{testcode} = sub { 

 my $node = shift;

 my $t = shift;

 if (findvalue('parent::blockquote', $node)) {

  $t->{pre} = "<b>";

  $t->{post} = "</b><br>\n";

 }

 elsif (findvalue('parent::artheader', $node)) {

  $t->{pre} = "<h1>";

  $t->{post} = "</h1>";

 }

 else {

  my $parent = findvalue('name(..)', $node);

  if (my ($level) = $parent =~ m/sect(\d+)$/) {

   $t->{pre} = "<h$level>";

   $t->{post} = "</h$level>";

  }

 }



 return 1;

};

In this code, we check the parent node before performing our modification to the local $t hashref. Particularly useful is the ability to use Perl regular expressions to extract values.

Copying Styles

One feature of XPathScript that is really hard to do with XSLT is to be able to copy a style completely:


<%

$t->{'foo'}{pre} = "<i>";

$t->{'foo'}{post} = "</i>";

$t->{'foo'}{showtag} = 1;



$t->{'bar'} = $t->{'foo'};

%>

While this would be possible in XSLT using entities, it's certainly not very practical or neat. With XPathScript, many tags can share the same template. Be careful though--this is a reference copy, not a deep copy, so the following may not do what you think it should:


<%

$t->{'foo'}{pre} = "<i>";

$t->{'foo'}{post} = "</i>";

$t->{'foo'}{showtag} = 1;



$t->{'bar'} = $t->{'foo'};

$t->{'bar'}{post} = "</i><br>";

%>

Because this is a reference, the last line changes the values for 'foo' as well as 'bar'.

A "Catch All"?

Does XPathScript have a "catch all" option for elements that don't have a $t entry? Yes indeed! Simply set $t->{'*'} to the template you want to use. You can even do some really clever things, such as using the testcode section to output a warning to the Apache error log about an unrecognized tag, rather than having to place some output in the resulting document and bother your users!

Conclusion

Resources

AxKit
Introduction to AxKit
XPath
Norm Walsh's XPath
   Introduction

XPathScript brings the power of XPath into a more familiar environment for most web developers. It enables developers to retain their existing investment in mod_perl pages while moving to using XML for underlying content. Its pragmatic mix of the declarative and procedural ensures flexibility and performance.