Introducing XML::SAX::Machines, Part One
In recent columns we have seen that SAX provides a modular way to generate and filter XML content. For those just learning how SAX works, though, the task of hooking up to the correct parser-generator-driver and building chains of filters can be tricky. More experienced SAX users may have a clearer picture of how to proceed, but they often find that initializing complex filter chains is tedious and lends itself to lots of duplicated code.
Consider the following simple filter chain script:
use XML::SAX::ParserFactory; use XML::SAX::Writer; use My::SAXFilter::One; use My::SAXFilter::Two; use My::SAXFilter::Three; my $writer = XML::SAX::Writer->new(); my $filter3 = My::SAXFilter::Three->new( Handler => $writer ); my $filter2 = My::SAXFilter::Two->new( Handler => $filter3 ); my $filter1 = My::SAXFilter::One->new( Handler => $filter2 ); my $parser = XML::SAX::ParserFactory->parser( Handler => $filter1 ); $parser->parse_uri( $xml_file );
Not too bad for this tiny example, perhaps, but imagine
how it might look in a complex system with 10 or 15 filters all
doing their part. Also, new SAX users often stumble over the fact
that the handler chain must be built in reverse order
($filter3 has to be initialized before
$filter2 so it can be passed in as the handler class,
for example). Yet another potential weakness in this script is that
the filters in the chain are hard-coded from the start. While it is
possible to make some aspects more flexible, adding the ability to
have a dynamic list of filters only adds to the complexity of the
script.
Barrie Slaymaker's outstanding new
XML::SAX::Machines addresses both the complexity and the
tedium of creating SAX systems. Compare the following snippet to the
one above.
use XML::SAX::Machines qw( :all );
my $machine = Pipeline(
"My::SAXFilter::One",
"My::SAXFilter::Two",
"My::SAXFilter::Three",
\*STDOUT
);
$machine->parse_uri( $xml_file );
Less verbose, more intuitive (note that the chain is declared in processing order) and, perhaps most importantly, making the filter chain dynamic is as simple as creating a list of strings containing module names:
my $machine = Pipeline(
@filter_list,
\*STDOUT
);
Where @filter_list is built dynamically elsewhere in
the application.
The story does not end there,
however. XML::SAX::Machines and its associated
Machine classes provide a small host of options for
building easy-to-maintain SAX-based XML processing systems. Over the
next two months we will be looking at this inventive distribution,
beginning with this month's introduction.
XML::SAX::Machines is high-level wrapper class that
allows its various Machine classes (which may also be
used as standalone libraries) to be easily chained together to
create complex SAX filtering
systems. XML::SAX::Machines currently installs and
knows about several Machines by default.
Implemented by XML::SAX::Pipeline, a
Pipeline provides a way to set up a linear series of
filters (or other Machines) that works like the traditional
hand-rolled SAX filter chain that we looked at in the introduction.
That is, the events fired go directly to the next filter or handler
on the chain with no intervention.
my $machine = Pipeline(
"My::SAXFilter::One",
"My::SAXFilter::Two",
"My::SAXFilter::Three",
\*STDOUT
);
In this example, the three filter classes are fired in linear order with the
results of My::SAXFilter::One being sent to My::SAXFilter::Two
and so on.
Manifold Machines provide a way to create multi-pass
filters. The events are cached at the beginning of the
Manifold's run and duplicate copies of that event
stream are sent through the filters one by one and recompiled into a
single document upon completion. It is implemented by
XML::SAX::Manifold.
my $machine = Pipeline(
Manifold(
"My::SAXFilter::A",
"My::SAXFilter::B",
"My::SAXFilter::C",
),
\*STDOUT
);
Here, events fired during parsing are buffered and sent directly to each of the three filters (in order) and the output of each of the filters is merged into a single stream before being handed off to the Writer class.
Implemented by XML::SAX::Tap, a Tap offers
a way to insert a class that examines one or more SAX events, but in
no way alters the data passed to the next filter or handler. This
can be extremely useful for cases where you want to examine the
result of a given filter or other Machine part for debugging
purposes. The handler that you use for your Tap need
not forward the events as a typical filter would since the same
events will also be sent to the next handler in the chain as if the
Tap did not exist. Note:
my $machine = Pipeline(
"My::SAXFilter::One",
"My::SAXFilter::Two",
Tap(
"My::SAXDumper"
),
"My::SAXFilter::Three",
\*STDOUT
);
In this case, we have taken the Pipeline from above and
added a Tap to send events fired by
My::SAXFilter::Two to our SAXDumper for debugging.
ByRecord carves up record-oriented XML documents and
sends each record through each filter in the ByRecord
machine as a separate event stream delimited by
start_document and end_document
events. All other events (data outside of the records) are forwarded
appropriately to the downstream filter or handler. It is implemented
by XML::SAX::ByRecord
my $machine = Pipeline(
ByRecord(
"My::RecordFilter::One",
"My::RecordFilter::Two",
),
"My::SAXFilter::One",
"My::SAXFilter::Two",
"My::SAXFilter::Three",
\*STDOUT
);
In this case, we have taken the Pipeline from above and
added a ByRecord Machine to process the record-oriented
parts of the document before beginning the rest of the
Pipeline chain.
Now that we have an idea of the various Machines that are currently available, let's get straight to this month's code example.
|
One of the more interesting ideas to emerge in the Web development world in recent years is the notion of custom tag libraries (or taglibs, for short). In a taglib implementation one or more custom tags are defined and the server application evaluates and expands or replaces those tags with the result of running some chunk of code on the server. This allows document authors to add reusable bits of server-side functionality to their pages without the hair loss associated with embedding code in the documents.
For this month's example we will write a mod_perl handler that allows us
to create our own custom taglibs. We will do this by creating SAX
filters that transform the various tags in our library into the
desired results. ANd we'll use SAX::Machines within our
Apache handler to manage the filter chain.
First, we need to define our taglib. To keep the example simple we
start off with only two tags: an <include> tag that
provides a way to insert the contents of an external document defined
by the uri attribute at the location of the tag, and a
<fortune> tag that inserts a random quote.
To avoid possible collision with the elements allowed in the documents that will contain the tags from our taglib, we will quarantine them in their own XML namespace and bind that namespace to the prefix "widget".
Here is an example of a simple XHTML document containing our custom tags:
<?xml version="1.0"?>
<html xmlns:widget="http://localhost/saxpages/widget">
<head>
<title>My Cool Taglib-Enabled Page</title>
</head>
<body>
<widget:include uri="/path/to/widgets/common_header.xml"/>
<p>
Today quote is:
</p>
<pre><widget:fortune/></pre>
<p>
Thanks for stopping by.
</p>
<widget:include uri="/path/to/widgets/common_footer.xml"/>
</body>
</html>
Now let's create our SAX filters to expand our custom tags. We'll write the filter that include an external XML document, first.
package Widget::Include; use strict; use vars qw(@ISA $WidgetURI); @ISA = qw(XML::SAX::Base); $WidgetURI = 'http://localhost/saxpages/widget';
After a bit of initialization we get straight to the SAX event
handlers. In the start_element handler we examine the
current element's NamespaceURI and
LocalName properties to see if we have an "include"
element in our widgets namespace. If it finds one, it further checks
for an uri attribute, and, if it finds one, it passes
that file name on to a new parser instance using the current filter
as the handler.
sub start_element {
my ( $self, $el ) = @_;
if ( $el->{NamespaceURI} eq $WidgetURI &&
$el->{LocalName} eq 'include' ) {
if ( defined $el->{Attributes}->{'{}uri'} ) {
my $uri = $el->{Attributes}->{'{}uri'}->{Value};
my $parser = XML::SAX::ParserFactory->parser( Handler => $self );
$p->parse_uri( $uri );
}
}
If we did not get an element with the right name in the right namespace we forward the event to the next filter in the chain.
else {
$self->SUPER::start_element( $el );
}
}
We do a similar test in the end_element event handler; forwarding the events
that we are not interested in.
sub end_element {
my ( $self, $el ) = @_;
$self->SUPER::end_element( $el ) unless
$el->{NamespaceURI} eq $WidgetURI and
$el->{LocalName} eq 'include';
}
Also in Perl and XML |
|
OSCON 2002 Perl and XML Review PDF Presentations Using AxPoint |
That's it. Since this filter inherits from
XML::SAX::Base we need only implement the event
handlers that are required for the task at hand. All other events
will be safely forwarded to the next filter/handler.
The filter that implements the <widget:fortune> tag is very similar. We check to see
if the current element is named "fortune" and is bound to the correct namespace. If so,
we replace the element with the text returned from a system call to the fortune
program. If not, the events are forwarded to the next filter.
package Widget::Fortune;
use strict;
use vars qw(@ISA $WidgetURI);
@ISA = qw(XML::SAX::Base);
$WidgetURI = 'http://localhost/saxpages/widget';
sub start_element {
my ( $self, $el ) = @_;
if ( $el->{NamespaceURI} eq $WidgetURI &&
$el->{LocalName} eq 'fortune' ) {
my $fortune = `/usr/games/fortune`;
$self->SUPER::characters( { Data => $fortune } );
}
else {
$self->SUPER::start_element( $el );
}
}
sub end_element {
my ( $self, $el ) = @_;
$self->SUPER::end_element( $el ) unless
$el->{NamespaceURI} eq $WidgetURI and
$el->{LocalName} eq 'fortune';
}
With the filters out of the way we turn to the Apache handler that
will make our filters work as expected for the files on our
server. The basic Apache handler module that makes our taglibs work
is astonishingly small considering what it provides. We simply create
a new instance of XML::SAX::Pipeline then, inside the
required handler subroutine, we create a
Pipeline machine, passing in the names of the widget
filter classes we just created. Then we send the required HTTP
headers and call parse_uri on the file being requested
by the client.
package SAXWeb::MachinePages;
use strict;
use XML::SAX::Machines qw( :all );
sub handler {
my $r = shift;
my $machine = Pipeline(
"Widget::Include" =>
"Widget::Fortune" =>
\*STDOUT
);
$r->content_type('text/html');
$r->send_http_header;
$machine->parse_uri( $r->filename );
}
Finally, we need to upload the XML documents to the server and add a small bit to one of our Apache configuration file so our handler is called appropriately. I used
<Directory /www/sites/myhostdocroot >
<FilesMatch "\.(xml|xhtml)">
SetHandler perl-script
PerlHandler SAXWeb::MachinePages
</FilesMatch>
</Directory>
After restarting Apache, a request to the XML document we created earlier will look something like the following:
<html xmlns:widget='http://localhost/saxpages/widget'>
<head>
<title>My Cool Page</title>
</head>
<body>
<div class='header'>
<h2>MySite.tld</h2>
<hr />
</div>
<p>
Today quote is:
</p>
<pre>The faster we go, the rounder we get.
-- The Grateful Dead
</pre>
<p>
Thanks for stopping by.
</p>
<div class='footer'>
<hr />
<p>Copyright 2000 MySite.tld, Ltd. All rights reserved.</p>
</div>
</body>
</html>
No Webby awards here, to be sure, but the basic foundation is sound
and implementing new tags for our tag library is a matter of
creating new SAX filter classes and adding them the
Pipeline in the Apache handler.
We've only touched the surface of what
XML::SAX::Machines can do. Tune in next month when we
will delve deeper into the API and show off some of its advanced
features.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.