
Transforming XML With SAX Filters
Introduction
Last month we began our exploration of more advanced SAX topics with a look at how SAX events can be generated from non-XML data. This month, we conclude the series by introducing SAX filters and their use in XML data transformation.
What Is A SAX Filter?
A SAX filter is simply a class that is passed as the event handler to another class that generates SAX events, then forwards all or some of those events on the next handler (or filter) in the processing chain. A filter may prune the document tree by not forwarding events for elements with a given name (or that meet some other condition), while in other cases, a filter might generate its own new events to add parent or child elements to certain elements the existing document stream. Also, element attributes can be added or removed or the character data altered in some way. Really any class that is able to receive SAX events, then call event methods on another SAX handler in a way that alters the document stream can be seen as a SAX filter.
In practice, SAX filters are like conceptual cousins of many of the standard UNIX tools. By themselves, these tools often perform only a single, simple task, but when piped together they are capable of astonishing feats. In the same way, the real power of SAX filters is derived from the fact that simpler, easy-to-maintain filters may be chained together to produce complex XML data transformations.
Transforming Data Within Existing Events
|
|
| Post your comments |
For our first example we will create a simple SAX filter that transforms
the character data passed from XML::Parser::PerlSAX then hands
it on to Michael Koehne's XML::Handler::YAWriter to produce the
final XML document.
use strict; use XML::Parser::PerlSAX; use XML::Handler::YAWriter; use IO::File; my $file = $ARGV[0] || die "Please pass a file name to process\n";
With the necessary modules included, we get to the section that reveals
just exactly how SAX filters work. Notice that we create a new instance of
XML::YAWriter, then pass that object as the Handler
for our custom filter, the instance of which is passed as the
Handler to XML::Parser::PerlSAX. When the script
is executed, the parser will call its SAX events on the methods in our
FilterPorcus class, which, in turn will call the event methods
on the writer class to print the result to STDOUT.
Note that when defining event chains, the classes are created in reverse order, with the first handler being the last class that is actually called. This may seem a bit confusing at first but with a little practice, you will get the hang of it.
my $writer = XML::Handler::YAWriter->new(Output => IO::File->new( ">-" ));
my $filter = FilterPorcus->new(Handler => $writer);
my $parser = XML::Parser::PerlSAX->new(Handler => $filter);
my %parser_args = (Source => {SystemId => $file});
$parser->parse(%parser_args);
# end main
Next we create our custom filter class as an inline Perl package. Pay
special attention to the fact that our class inherits from Matt Sergeant's
XML::Filter::Base class. This allows us to implement only those
handler methods that are relevant to our filter since
XML::Filter::Base automatically forwards, by default, all SAX
to the next handler class in the chain. If our class were not a subclass of
Filter::Base we would have to explicitly forward each and every
event that the previous class could potentially generate.
# silly text transformer
package FilterPorcus;
use strict;
use base qw(XML::Filter::Base);
sub new {
my $class = shift;
my %options = @_;
return bless \%options, $class;
}