XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Transforming XML With SAX Filters

Transforming XML With SAX Filters

October 10, 2001


Last month we began our exploration of more advanced SAX topics with a look at how SAX events can be generated from non-XML data. This month, we conclude the series by introducing SAX filters and their use in XML data transformation.

What Is A SAX Filter?

A SAX filter is simply a class that is passed as the event handler to another class that generates SAX events, then forwards all or some of those events on the next handler (or filter) in the processing chain. A filter may prune the document tree by not forwarding events for elements with a given name (or that meet some other condition), while in other cases, a filter might generate its own new events to add parent or child elements to certain elements the existing document stream. Also, element attributes can be added or removed or the character data altered in some way. Really any class that is able to receive SAX events, then call event methods on another SAX handler in a way that alters the document stream can be seen as a SAX filter.

In practice, SAX filters are like conceptual cousins of many of the standard UNIX tools. By themselves, these tools often perform only a single, simple task, but when piped together they are capable of astonishing feats. In the same way, the real power of SAX filters is derived from the fact that simpler, easy-to-maintain filters may be chained together to produce complex XML data transformations.

Transforming Data Within Existing Events

Comment on this article Have you used SAX filters to do pipeline XML transformations? Share your experience in our forum.
Post your comments

For our first example we will create a simple SAX filter that transforms the character data passed from XML::Parser::PerlSAX then hands it on to Michael Koehne's XML::Handler::YAWriter to produce the final XML document.

use strict;
use XML::Parser::PerlSAX;
use XML::Handler::YAWriter;
use IO::File;

my $file = $ARGV[0] || die "Please pass a file name to process\n";

With the necessary modules included, we get to the section that reveals just exactly how SAX filters work. Notice that we create a new instance of XML::YAWriter, then pass that object as the Handler for our custom filter, the instance of which is passed as the Handler to XML::Parser::PerlSAX. When the script is executed, the parser will call its SAX events on the methods in our FilterPorcus class, which, in turn will call the event methods on the writer class to print the result to STDOUT.

Note that when defining event chains, the classes are created in reverse order, with the first handler being the last class that is actually called. This may seem a bit confusing at first but with a little practice, you will get the hang of it.

my $writer = XML::Handler::YAWriter->new(Output => IO::File->new( ">-" ));
my $filter = FilterPorcus->new(Handler => $writer);
my $parser = XML::Parser::PerlSAX->new(Handler => $filter);

my %parser_args = (Source => {SystemId => $file});

# end main

Next we create our custom filter class as an inline Perl package. Pay special attention to the fact that our class inherits from Matt Sergeant's XML::Filter::Base class. This allows us to implement only those handler methods that are relevant to our filter since XML::Filter::Base automatically forwards, by default, all SAX to the next handler class in the chain. If our class were not a subclass of Filter::Base we would have to explicitly forward each and every event that the previous class could potentially generate.

# silly text transformer
package FilterPorcus;
use strict;
use base qw(XML::Filter::Base);

sub new {
  my $class = shift;
  my %options = @_;
  return bless \%options, $class;

Pages: 1, 2, 3, 4

Next Pagearrow