XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Introducing XML::SAX::Machines, Part Two

Introducing XML::SAX::Machines, Part Two

March 20, 2002


In last month's column we began our introduction to XML::SAX::Machines, a group of modules which greatly simplifies the creation of complex SAX application with multiple filters. This month we pick up where we left off by further illustrating how XML::SAX::Machines can be used to remove most of the drudgery of building SAX-based XML processing applications. If you have not read last month's offering, please do so now.

Example One - MachinePages Revisited

In last month's column we created a simple mod_perl handler that uses SAX::Machines to allow developers and HTML authors to use custom tag libraries in their HTML documents. This example was fine, as far as it went, but it can be made a lot more robust and flexible with very little effort. For example, the list of SAX filters in the previous example was hard-coded into the handler script itself. One of the best features of the interface that SAX::Machines provides is that the filter chains and other machine definitions can be built dynamically at run-time using simple Perl arrays.

For our first example this month we will extend the previous MachinePages handler to capitalize on SAX::Machines' dynamic abilities by allowing the SAX filters applied to a given document to be passed in through Apache's configuration API. In addition, we will give developers the option to apply one or more XSLT stylesheets to the filtered SAX event streams; again, allowing the choice of stylesheets to be selected via configuration directive.

package SAXWeb::MachinePages;

use strict;
use Apache::Constants;
use XML::SAX::Machines qw( :all );
use XML::Filter::XSLT;
use XML::SAX::Writer;

sub handler {
  my $r = shift;

  my @filters;

With the basic initialization out of the way, we can begin reading in the list of filters that are to be applied to the given request. We do this by calling the dir_config method on the Apache::Request object, processing the string containing a custom MachineFilters option (if on exists) into an array, and pushing that array onto our global list of SAX filters.

  if ( defined( $r->dir_config('MachineFilters') ) ) {
      my @widgets = split /\s+/, $r->dir_config('MachineFilters');
      push @filters, @widgets

Next, we use dir_config to check for a MachineStyles option and, if present, we process that option and use the resulting filenames to configure a chain of XML::Filter::XSLT instances As above, we append those instances onto the top-level list of filters to be applied to the event stream.

  if ( defined( $r->dir_config('MachineStyles') ) ) {
      my @stylesheets =  split /\s+/, $r->dir_config('MachineStyles');
      foreach my $stylesheet ( @stylesheets ) {
          my $xsl_filter =  XML::Filter::XSLT->new();
          $xsl_filter->set_stylesheet_uri( $stylesheet );
          push @filters, $xsl_filter;

Note the difference between this block and the previous one. In the MachineFilters block we simply added strings containing the class names of the SAX filters to the list of filters, while, here, we have created instances of the XML::Filter::XSLT class and pushed those blessed objects onto the list. XML::SAX::Machines invisibly copes with both cases by autoloading and creating new instances of those filter classes passed in as plain strings, while working in the predictable way for those filters which are passed as blessed references.

Moving on, we create a new XML::SAX::Writer object and set its output stream to point at a plain scalar variable inventively named $output.

  my $output = '';
  my $writer = XML::SAX::Writer->new( Output => \$output );

Next, we create a Pipeline machine, which gives us linear chain of SAX filters, passing in the list of filters we have collected and setting the instance of XML::SAX::Writer as the final handler.

    my $machine = Pipeline(

We then call the machine's parse_uri method, passing in the file name of the document that client requested.

    $machine->parse_uri(  $r->filename );

Note that we did not create an instance of a SAX parser class, but, rather, called parse_uri on the Pipeline object. Again, XML::SAX::Machines "does what you mean" in this case by creating an instance of an XML::SAX parser behind the scenes.

To finish off our Apache handler, we have to set the appropriate HTTP headers and send the result of the SAX process to the client.

    $r->print( $output );
    return OK;


Our new MachinePages handler now allows for fine-grained control over which filters are applied to which documents using options from the server-wide httpd.conf file or .htaccess files.

PerlSetVar MachineFilters "My::FilterOne My::FilterTwo"
PerlSetVar MachineStyles "/www/htdocs/stylesheets/default.xsl ..."

These options can be used as-is or wrapped in <Directory>, <Files>, <FilesMatch> containers, or any of the other common Apache configuration control blocks, for greater control.

Comment on this article Are you using XML::SAX::Machines to build SAX-centric XML applications? Share your experience in our forum.
Post your comments

Example Two - Creating A Smart SAX Controller

In complex applications where producing XML is only one part of the overall functionality provided, it is often wise to keep the XML processing facilities as separate from the core application as possible. One way to achieve this is to create an abstract "controller" class that handles the gory details; allowing the core application to call a few simple methods to achieve complex results. XML::SAX::Machines is especially well-suited for creating these simple but powerful abstract controllers. For our second and final example we will build a class that implements this pattern.

Consider the following illustration:

Diagram of a SAX Controller

We see from this diagram that the SAX controller class is responsible for establishing the SAX processing chain from end to end, while providing a simple one-stop interface to the rest of the application. The application simply calls one or two methods in the controller class to obtain the result it expects

Pages: 1, 2

Next Pagearrow