XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Introducing XML::SAX::Machines, Part One

Introducing XML::SAX::Machines, Part One

February 13, 2002

Introduction

In recent columns we have seen that SAX provides a modular way to generate and filter XML content. For those just learning how SAX works, though, the task of hooking up to the correct parser-generator-driver and building chains of filters can be tricky. More experienced SAX users may have a clearer picture of how to proceed, but they often find that initializing complex filter chains is tedious and lends itself to lots of duplicated code.

Consider the following simple filter chain script:

use XML::SAX::ParserFactory;
use XML::SAX::Writer;
use My::SAXFilter::One;
use My::SAXFilter::Two;
use My::SAXFilter::Three;

my $writer  = XML::SAX::Writer->new();
my $filter3 = My::SAXFilter::Three->new( Handler => $writer );
my $filter2 = My::SAXFilter::Two->new( Handler => $filter3 );
my $filter1 = My::SAXFilter::One->new( Handler => $filter2 );
my $parser = XML::SAX::ParserFactory->parser( Handler => $filter1 );

$parser->parse_uri( $xml_file );
 

Not too bad for this tiny example, perhaps, but imagine how it might look in a complex system with 10 or 15 filters all doing their part. Also, new SAX users often stumble over the fact that the handler chain must be built in reverse order ($filter3 has to be initialized before $filter2 so it can be passed in as the handler class, for example). Yet another potential weakness in this script is that the filters in the chain are hard-coded from the start. While it is possible to make some aspects more flexible, adding the ability to have a dynamic list of filters only adds to the complexity of the script.

Barrie Slaymaker's outstanding new XML::SAX::Machines addresses both the complexity and the tedium of creating SAX systems. Compare the following snippet to the one above.

use XML::SAX::Machines qw( :all );

my $machine = Pipeline(
    "My::SAXFilter::One",
    "My::SAXFilter::Two",
    "My::SAXFilter::Three",
    \*STDOUT
  );

$machine->parse_uri(  $xml_file );

Less verbose, more intuitive (note that the chain is declared in processing order) and, perhaps most importantly, making the filter chain dynamic is as simple as creating a list of strings containing module names:

my $machine = Pipeline(
    @filter_list,
    \*STDOUT
  );

Where @filter_list is built dynamically elsewhere in the application.

The story does not end there, however. XML::SAX::Machines and its associated Machine classes provide a small host of options for building easy-to-maintain SAX-based XML processing systems. Over the next two months we will be looking at this inventive distribution, beginning with this month's introduction.

Machine Types

XML::SAX::Machines is high-level wrapper class that allows its various Machine classes (which may also be used as standalone libraries) to be easily chained together to create complex SAX filtering systems. XML::SAX::Machines currently installs and knows about several Machines by default.

Pipeline

Implemented by XML::SAX::Pipeline, a Pipeline provides a way to set up a linear series of filters (or other Machines) that works like the traditional hand-rolled SAX filter chain that we looked at in the introduction. That is, the events fired go directly to the next filter or handler on the chain with no intervention.

my $machine = Pipeline(
    "My::SAXFilter::One",
    "My::SAXFilter::Two",
    "My::SAXFilter::Three",
    \*STDOUT
  );

In this example, the three filter classes are fired in linear order with the results of My::SAXFilter::One being sent to My::SAXFilter::Two and so on.

Manifold

Manifold Machines provide a way to create multi-pass filters. The events are cached at the beginning of the Manifold's run and duplicate copies of that event stream are sent through the filters one by one and recompiled into a single document upon completion. It is implemented by XML::SAX::Manifold.

my $machine = Pipeline(
	Manifold(
    	"My::SAXFilter::A",
    	"My::SAXFilter::B",
    	"My::SAXFilter::C",
      ),
    \*STDOUT
  );

Here, events fired during parsing are buffered and sent directly to each of the three filters (in order) and the output of each of the filters is merged into a single stream before being handed off to the Writer class.

Tap

Implemented by XML::SAX::Tap, a Tap offers a way to insert a class that examines one or more SAX events, but in no way alters the data passed to the next filter or handler. This can be extremely useful for cases where you want to examine the result of a given filter or other Machine part for debugging purposes. The handler that you use for your Tap need not forward the events as a typical filter would since the same events will also be sent to the next handler in the chain as if the Tap did not exist. Note:

my $machine = Pipeline(
    "My::SAXFilter::One",
    "My::SAXFilter::Two",
    Tap(
		"My::SAXDumper"
	   ),
    "My::SAXFilter::Three",
    \*STDOUT
  );

In this case, we have taken the Pipeline from above and added a Tap to send events fired by My::SAXFilter::Two to our SAXDumper for debugging.

ByRecord

ByRecord carves up record-oriented XML documents and sends each record through each filter in the ByRecord machine as a separate event stream delimited by start_document and end_document events. All other events (data outside of the records) are forwarded appropriately to the downstream filter or handler. It is implemented by XML::SAX::ByRecord

my $machine = Pipeline(
    ByRecord(
		"My::RecordFilter::One",
		"My::RecordFilter::Two",
	   ),
    "My::SAXFilter::One",
    "My::SAXFilter::Two",
    "My::SAXFilter::Three",
    \*STDOUT
  );

In this case, we have taken the Pipeline from above and added a ByRecord Machine to process the record-oriented parts of the document before beginning the rest of the Pipeline chain.

Now that we have an idea of the various Machines that are currently available, let's get straight to this month's code example.

Pages: 1, 2

Next Pagearrow







close