XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Transforming XML With SAX Filters
by Kip Hampton | Pages: 1, 2, 3, 4

Our filter is only interested in transforming the text nodes of the input document, so we will only implement the characters method. After passing the character data to the local porcus subroutine for transformation, we forward the result to the next handler by calling the characters event on that handler.

sub characters {
  my ($self, $chars) = @_;
  my $out = $self->porcus($chars->{Data});
  $self->{Handler}->characters({Data => $out});

Finally we get to the porcus method that returns the string passed to it transformed into the desired format using a little regular expression voodoo.

sub porcus {
  my ($self, $chars) = @_;
  $chars =~ tr/A-Z/a-z/;
  $chars =~ s/\b([aeiou])/w$1/g;
  my $cons = q{[bcfghjklmnpqrstvwxz]};
  $chars =~ s/\b(qu|$cons($cons$cons?)?|[a-z])([a-z]*)/$3$1ay/g;
  return $chars;

Feeding this script a snippet of Larry Wall's latest Perl 6 Apocalypse produces the following result:

  otay emay, oneway ofway ethay ostmay
  agonizingway aspectsway ofway anguage
  lay esignday isway omingcay upway
  ithway away usefulway ystemsay ofway
  operatorsway.  otay otherway
  anguagelay esignersday, isthay aymay
  eemsay ikelay away illysay ingthay
  otay agonizeway overway.  afterway
  allway, ouyay ancay iewvay allway
  operatorsway asway eremay yntacticsay
  ugarsay -- operatorsway
  areway ustjay unnyfay ookinglay
  unctionfay allscay.

Okay, the result is admittedly pretty silly -- there may even be those who would argue that converting Uncle Larry's prose to pig latin is a bit redundant -- but the script does illustrate the basics of creating a simple SAX filter:

  • It accepts SAX events from a SAX filter or other event generator.
  • It alters the document stream (in this case, by transforming all text data to pig latin).
  • It forwards SAX events to the next handler or filter in the chain.

If we also wanted to transform the element and attribute names and values in addition to the text data we would only need to add the following start_element and end_element handlers.

sub start_element {
  my ($self, $element) = @_;
  my %attrs = %{$element->{Attributes}};

  while ( my ($name, $value) = (each (%attrs))) {
    my $orig_name = $name;
    $name = $self->porcus($name);
    $value = $self->porcus($value);
    $attrs{$name} = $value;
    delete $attrs{$orig_name};

  $element->{Attributes} = \%attrs;
  my $elname = $self->porcus($element->{Name});
  $element->{Name} = $elname;

sub end_element {
  my ($self, $element) = @_;
  my $elname = $self->porcus($element->{Name});
  $element->{Name} = $elname;

Again, the principles are the same: accept events, alter the data, then forward that altered data by calling events on the filter's designated handler.

Enough silliness, let's look at a more practical example.

Pages: 1, 2, 3, 4

Next Pagearrow