Transforming XML With SAX Filters
by Kip Hampton
|
Pages: 1, 2, 3, 4
Our filter is only interested in transforming the text nodes of the
input document, so we will only implement the characters
method. After passing the character data to the local porcus
subroutine for transformation, we forward the result to the next handler by
calling the characters event on that handler.
sub characters {
my ($self, $chars) = @_;
my $out = $self->porcus($chars->{Data});
$self->{Handler}->characters({Data => $out});
}
Finally we get to the porcus method that returns the string
passed to it transformed into the desired format using a little regular
expression voodoo.
sub porcus {
my ($self, $chars) = @_;
$chars =~ tr/A-Z/a-z/;
$chars =~ s/\b([aeiou])/w$1/g;
my $cons = q{[bcfghjklmnpqrstvwxz]};
$chars =~ s/\b(qu|$cons($cons$cons?)?|[a-z])([a-z]*)/$3$1ay/g;
return $chars;
}
Feeding this script a snippet of Larry Wall's latest Perl 6 Apocalypse produces the following result:
<html> <body> <p> otay emay, oneway ofway ethay ostmay agonizingway aspectsway ofway anguage lay esignday isway omingcay upway ithway away usefulway ystemsay ofway operatorsway. otay otherway anguagelay esignersday, isthay aymay eemsay ikelay away illysay ingthay otay agonizeway overway. afterway allway, ouyay ancay iewvay allway operatorsway asway eremay yntacticsay ugarsay -- operatorsway areway ustjay unnyfay ookinglay unctionfay allscay. </p> </body> </html>
Okay, the result is admittedly pretty silly -- there may even be those who would argue that converting Uncle Larry's prose to pig latin is a bit redundant -- but the script does illustrate the basics of creating a simple SAX filter:
- It accepts SAX events from a SAX filter or other event generator.
- It alters the document stream (in this case, by transforming all text data to pig latin).
- It forwards SAX events to the next handler or filter in the chain.
If we also wanted to transform the element and attribute names and
values in addition to the text data we would only need to add the following
start_element and end_element handlers.
sub start_element {
my ($self, $element) = @_;
my %attrs = %{$element->{Attributes}};
while ( my ($name, $value) = (each (%attrs))) {
my $orig_name = $name;
$name = $self->porcus($name);
$value = $self->porcus($value);
$attrs{$name} = $value;
delete $attrs{$orig_name};
}
$element->{Attributes} = \%attrs;
my $elname = $self->porcus($element->{Name});
$element->{Name} = $elname;
$self->{Handler}->start_element($element);
}
sub end_element {
my ($self, $element) = @_;
my $elname = $self->porcus($element->{Name});
$element->{Name} = $elname;
$self->{Handler}->end_element($element);
}
Again, the principles are the same: accept events, alter the data, then forward that altered data by calling events on the filter's designated handler.
Enough silliness, let's look at a more practical example.