XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Introducing XML::SAX::Machines, Part Two
by Kip Hampton | Pages: 1, 2

Typical SAX Controller Program Flow

  1. The developer initializes a new Controller object inside the core application, passing in instances of the desired Generator, Handler, and any desired Filters.

  2. The developer calls the Controller's parse() method, passing in whatever data the Generator needs to initialize the SAX event stream.

  3. The Controller passes that data along to the Generator via the prepare() method (if it implements one).

  4. If implemented, the Generator's prepare() method examines or alters the data passed and returns the (possibly altered) data back to the Controller. During this examination, the Generator has a chance to see what it is about to parse and has the opportunity to set additional Filters.

  5. The Controller calls the Generator's set_filters() method (if implemented) to retrieve any additional Filters.

  6. The Controller initializes the SAX filter chain (via SAX::Machines ), setting the passed Handler as the Machine's final SAX Handler, and the Machine itself as the event handler for the passed Generator.

  7. The Controller calls the Generator's parse() method, passing in the altered data returned from the prepare() method.

  8. The Generator begins the SAX event stream, firing the events ( start_document, start_element, end_document, etc.) at the first Filter in the chain, if any.

  9. The final Filter (or Generator, if no Filters were added to the chain) fires the SAX events at the Handler. The Handler does something with the data passed through the event methods (builds a DOM tree, writes an XML document to the file system or browser, etc.)

  10. The result of the parse is returned to the application.

Let's get down to business and create the actual controller.

package MyController;

use strict;
use vars qw( $DefaultSAXHandler $DefaultSAXGenerator);
use XML::LibXML;
use XML::SAX::Machines qw( :all );

$DefaultSAXHandler   ||= 'StringWriter';
$DefaultSAXGenerator ||= 'XML::SAX::ParserFactory';

After a bit of initialization, we create the constructor for our controller. Borrowing from XML::SAX::Machines' DWIM nature, we will provide default classes for the Generator and Handler options, allowing developers to pass these in either as simple class names or blessed instances.

sub new {
    my $class = shift;
    my %args = @_;

    my $self;
    if ( defined $args{Handler} ) {
        if ( ! ref( $args{Handler} ) ) {
            my $handler_class =  $args{Handler};
            eval "require $handler_class";
            $args{Handler} = $handler_class->new();
        }
    }
    else {
        eval "require $DefaultSAXHandler";
        $args{Handler} = $DefaultSAXHandler->new();
    }

    if ( defined $args{Generator} ) {
        if ( ! ref( $args{Generator} ) ) {
            my $driver_class =  $args{Generator};
            eval "use $driver_class";
            $args{Generator} = $driver_class->new();
        }
    }
    else {
        eval "use $DefaultSAXGenerator";
        $args{Generator} = $DefaultSAXGenerator->new();
    }

    $args{FilterList} ||= [];

    $self = bless \%args, $class;

    return $self;
}

Next, we get to the meat of the controller class, the parse() method. In addition to allowing developers to pass SAX filters in during initialization, we will go a step further by allowing the SAX generator to set additional filters by giving it a chance to see what it's about to parse (via an optional prepare() method).

Also, we will send the result of the generator's prepare() as the sole argument for its parse() method. This idea is especially interesting given the fact that SAX event streams can be generated from more than just XML documents. So, for example, we could easily write a custom SAX generator that subclasses XML::Generator::DBI and implements a prepare() method that maps URLs to specific SQL queries. In that case, prepare() would return the SQL select statement rather than the URL passed in from the core application.

sub parse {
    my $self = shift;
    my $to_be_parsed = shift;

    # filters passed to the object from the application side.
    my @filterlist    = @{$self->{FilterList}} || ();

    # give the Generator a peek at what it's about to parse and alter it, if needed.
    if ( $self->{Generator}->can('prepare') {
      $to_be_parsed = $self->{Generator}->prepare( $to_be_parsed );
    }

    # allow filters to be passed from the generator
    # (could be hard-coded, or filters set during prepare()).

    if ( $self->{Generator}->can('get_filters') {
        push @filterlist, $self->{Generator}->get_filters;
    }

    # build the filter machine, setting the last stage to the passed Handler
    my $machine = Pipeline( @filterlist, $self->{Handler} );

    # set the generator to fire its events at the pipeline
    $self->{Generator}->set_handler( $machine );

    # get the result and return it to the app.
    my $parse_result = $self->{Generator}->parse( $to_be_parsed );
    return $parse_result;
}

To keep things flexible, we will also provide a few simple configuration methods for setting up the SAX controller.

sub set_handler {
    my $self = shift;
    my $handler = shift;
    if ( defined( $handler ) ) {
        if ( ! ref( $handler ) ) {
            my $handler_class =  $handler;
            eval "use $handler_class";
            $self->{Handler} = $handler_class->new();
        }
        else {
            $self->{Handler} = $handler;
        }
    }
}

sub set_generator {
    my $self = shift;
    my $generator = shift;
    if ( defined( $generator ) ) {
        if ( ! ref( $generator ) ) {
            my $generator_class =  $generator;
            eval "use $generator_class";
            $self->{Generator} = $generator_class->new();
        }
        else {
            $self->{Generator} = $generator;
        }
    }
}

sub set_filterlist {
    my $self = shift;
    my @filters = @_;
    $self->{FilterList} = \@filters;
}

1;

That's it, we're done with our controller. Here are a few examples of how it may be called from the core application.

For those who like methods --

my $sax_controller = MyController->new();
$sax_controller->set_generator( 'Some::Generator' );
$sax_controller->set_handler( $my_blessed_instance );
$sax_controller->set_filterlist( 'XML::Filter::Foo', 'XML::Filter::Bar' );
my $result = $sax_controler->parse( $something );

And the same for those who like constructor arguments instead --

my $sax_controller =
    MyController->new( Generator  => 'Some::Generator'
                     Handler    => $my_blessed_instance,
                     FilterList => \@list_of_filter_names);

my $result = $sax_controler->parse( $something );

As with the MachinePages example above, XML::SAX::Machines adds significant value to our application by making the filter chain both easy to configure and trivial to create dynamically.

Conclusions

Also in Perl and XML

OSCON 2002 Perl and XML Review

XSH, An XML Editing Shell

PDF Presentations Using AxPoint

Multi-Interface Web Services Made Easy

Perl and XML on the Command Line

XML::SAX::Machines makes the task of creating complex SAX-based application extremely simple and straightforward, while providing a level of flexibility that would by painful at best to duplicate by hand. It offers a modern, easy-to-use interface that, like all Perlish things, makes easy things easy and makes hard things... well, not just possible, but easy, too. If you are considering SAX as the API of choice for your XML processing applications, XML::SAX::Machines should be at the top of your evaluation list.

Resources