by Kip Hampton
No doubt about it, the hype associated with Web Services has reached astronomical proportions. Notably missing from the current flood of information, however, is a nuts-and-bolts level examination of how to build applications that provide both browser-based access for human users and programmatic access for automated clients. This month we will take a look at just how easy it is (using Perl and XML) to build these multi-interface services.
Note: this column is not about the relative merits or weaknesses of SOAP, XML-RPC or REST, nor will it attempt address the reasons why you might choose one and not another. The goal here is to demonstrate that, with a little forethought and a few Perl modules, you can easily create useful Web applications that can accessed from any or all of these types of clients.
We have a lot to cover, so let's get straight to business.
For this month's sole code example we will build a Web interface to my
XML::SemanticDiff
module. For those that may not be familiar,
XML::SemanticDiff
compares the contents of two XML documents while ignoring
common things like formatting differences, and
not-so-common things like otherwise identical elements and attributes with divergent
namespace prefixes bound to the same URI.
Before we begin, a basic understanding of Christian Glahn's
CGI::XMLApplication
is suggested before proceeding. Its not a requirement; you
will still be able to follow along, but having a
more detailed introduction can only help. If you are impatient and decide to skip
reading that previous column, it is enough to understand that the typical
CGI::XMLApplication
application consists of three parts: a tiny CGI script that
connects the client to the application, a Perl module that handles the heavy-lifting,
and one or more XSLT stylesheets that transforms DOM tree returned from the Perl module into
something palatable for the requesting client.
Understanding CGI::XMLApplication
's basic architecture is important because, from
a high level, it is exactly the same model used by Paul Kulchenko's wildly popular
SOAP::Lite
module. The key to providing simple, multi-client access lies in
understanding how these two fine modules can be used together.
First, we will look at the base module that both CGI::XMLApplication
and
SOAP::Lite
will use to compare the files uploaded to the server:
package WebSemDiff; use strict; use CGI::XMLApplication; use XML::SemanticDiff; use XML::LibXML::SAX::Builder; use XML::Generator::PerlData; use vars qw( @ISA ); @ISA = qw( CGI::XMLApplication );
After importing the necessary modules and declaring the package's inheritance from
CGI::XMLApplication
, we need to implement the methods required to
make the browser interface work.
The browser interface has two states: a default state that prompts the user to upload
two XML documents to compare, and a result state that shows the result of the comparison
(or any errors that may have occurred while processing). The selectStylesheet()
method returns the path to appropriate stylesheet that will transform the DOM tree built
by our application. To keep things on track we will not look at two stylesheets,
(semdiff_default.xsl
and semdiff_result.xsl
) themselves; but
they are available in this month's sample code if you are curious.
sub selectStylesheet { my ( $self, $context ) = @_; my $style = $context->{style} || 'default'; my $style_path = '/www/site/stylesheets/'; return $style_path . 'semdiff_' . $style . '.xsl'; }
By default, the required getDOM()
method is expected to return an
XML::LibXML::Document
object. This document object (which we will create
later) is what is transformed by the XSLT stylesheet set by the selectStylesheet()
method above before delivering the result to the browser.
sub getDOM { my ( $self, $context ) = @_; return $context->{domtree}; }
The getXSLParameter()
method provides a way to pass values from this class
out to the stylesheets (the values are available via <xsl:param> elements). Here
we just push all the request parameters and leave it to the stylesheet to pick
and choose which fields are relevant.
sub getXSLParameter { my $self = shift; return $self->Vars; }
With the low-level details out of the way we will now create the event callbacks that will be called in response to the state of the browser interface. Since the default state is a simple prompt that requires no application logic or special processing, we need only implement the callback for the result state.
# event registration and event callbacks sub registerEvents { return qw( semdiff_result ); } sub event_semdiff_result { my ( $self, $context ) = @_; my ( $file1, $file2, $error ); my $fh1 = $self->upload('file1'); my $fh2 = $self->upload('file2'); $context->{style} = 'result';
After setting the appropriate style for the application state, we retrieve the filehandles that contain the uploaded XML documents. We check to see that both are defined and, if so, we convert them to plain scalars.
if ( defined( $fh1 ) and defined( $fh2 ) ) { local $/ = undef; $file1 = <$fh1> $file2 = <$fh2>;
Next we create the DOM tree that contains the results of the comparison by calling the
compare_as_dom()
method. Wrapping this call in an eval
block ensures
that we can safely capture any parsing errors encountered while processing the uploaded
documents. We will look at the details of the compare_as_dom()
and
dom_from_data()
methods shortly.
eval { $context->{domtree} = $self->compare_as_dom( $file1, $file2 ); }; if ( $@ ) { $error = $@; } } else { $error = 'You must select two XML files to compare and wait for them to finish uploading'; } if ( $error ) { $context->{domtree} = $self->dom_from_data( { error => $error } ); }
The compare_as_dom()
method returns undef
if the two documents are
identical. If no DOM object was returned and no error were occurred, we create a document
with a single <message>
element telling the user that the document are
semantically the same.
unless ( defined( $context->{domtree} )) { my $msg = "Files are semantically identical."; $context->{domtree} = $self->dom_from_data( { message => $msg } ); } }
Having completed the single event callback we can move on to writing core methods both it and the SOAP dispatcher will share.
First, we will create the compare()
method. Not much more than a wrapper for
the XML::SemanticDiff
method of the same name, it accepts two scalars containing
the XML documents to be compared and returns the results, if any, as an array reference.
sub compare { my $self = shift; my ( $xmlstring1, $xmlstring2 ) = @_; my $diff = XML::SemanticDiff->new( keeplinenums => 1 ); my @results = $diff->compare( $xmlstring1, $xmlstring2 ); return \@results; }
We will finish up the WebSemDiff
class with a couple of handy convenience methods.
The dom_from_data()
method creates an XML::LibXML::Document
object (an XML document in the form of a DOM tree) by processing a reference to any common
Perl data structure through XML::Generator::PerlData
and hooking that generator
to XML::LibXML::SAX::Builder
to populate the tree. Recall that we call this
method in the result event callback to create the DOM tree containing the appropriate
messages if an error occurred, or if the documents being compared are identical.
sub dom_from_data { my ( $self, $ref ) = @_; my $builder = XML::LibXML::SAX::Builder->new(); my $generator = XML::Generator::PerlData->new( Handler => $builder ); my $dom = $generator->parse( $ref ); return $dom; }
Finally, we will create the compare_as_dom()
method. A simple wrapper for the last two
methods, it returns the results of a semantic comparison between two documents as a DOM object.
sub compare_as_dom { my $self = shift; my $diff_messages = $self->compare( @_ ); return undef unless scalar( @{$diff_messages} ) > 0; return $self->dom_from_data( { difference => $diff_messages } ); } 1;
With the foundation now in place, we need only create the CGI script that will provide
access to the various clients. Here is where the architectural overlap between
CGI::XMLApplication
and SOAP::Lite
really pays off.
#!/usr/bin/perl -w use strict; use SOAP::Transport::HTTP; use WebSemDiff; if ( defined( $ENV{'HTTP_SOAPACTION'} )) { SOAP::Transport::HTTP::CGI -> dispatch_to('WebSemDiff') -> handle; } else { my $app = WebSemDiff->new(); $app->run(); }
Yes, that's all there is to it.
SOAP::Lite
's dispatch_to()
method connects the SOAP plumbing
to a given module (or directory of modules). In this case, it allows us to reuse the same
WebSemDiff
class that also implements the browser interface. Sharing
that module means that the publicly visible CGI is nothing more than a request broker that
provides access the methods in a single application class based on the type of client
making the connection. Users accessing the application through a Web browser are prompted
to upload two XML files and the posted data is run through the compare_as_dom()
method to obtain the result while SOAP clients have direct access to
compare_as_dom
, as well as the lower-level compare()
, and other
methods.
Now that we have a working (if not totally complete and sanity-checked) application, let's connect a few clients to it, compare two XML documents, and check out the results.
In the interest of clarity we will keep the documents being compared simple. We'll call
the first doc1.xml
<?xml version="1.0"?> <root> <el1 el1attr="good"/> <el2 el2attr="good">Some Text</el2> <el3/> </root>
and the second, doc2.xml
<?xml version="1.0"?> <root> <el1 el1attr="bad"/> <el2 bogus="true"/> <el4>Rogue</el4> </root>
A request to /cgi-bin/semdiff.cgi
prompts the user to upload two documents:
and after the files are compared, the results are given:
SOAP::Lite
provides both a server and a client implementation. We will use
it here to create the client that connects to the SOAP interface of our application. Note,
for brevity's sake we will skip over the parts of the client script that are concerned
with argument processing, opening and reading the XML files to compared, and focus on the
SOAP related parts. The complete script is available
in this month's sample code as soap_semdiff1.pl
.
#!/usr/bin/perl -w use strict; use SOAP::Lite; ... my $soap = SOAP::Lite -> uri('http://my.host.tld/WebSemDiff') -> proxy('http://my.host.tld/cgi-bin/semdiff.cgi') -> on_fault( \&fatal_error ); my $result = $soap->compare( $file1, $file2 )->result; print "Comparing $f1 and $f2...\n"; if ( defined $result and scalar( @{$result} ) == 0 ) { print "Files are semantically identical\n"; exit; } foreach my $diff ( @{$result} ) { print $diff->{context} . ' ' . $diff->{startline} . ' - ' . $diff->{endline} . ' ' . $diff->{message} . "\n"; }
Using passing this script the paths to our two tiny XML documents produces the following result:
Comparing docs/doc1.xml and docs/doc2.xml... /root[1]/el1[1] 3 - 3 Attribute 'el1attr' has different value in element 'el1'. /root[1]/el2[1] 4 - 4 Character differences in element 'el2'. /root[1]/el2[1] 4 - 4 Attribute 'el2attr' missing from element 'el2'. /root[1]/el2[1] 4 - 4 Rogue attribute 'bogus' in element 'el2'. /root[1] 5 - 5 Child element 'el3' missing from element '/root[1]'. /root[1] 5 - 5 Rogue element 'el4' in element '/root[1]'.
As an alternative, we could use SOAP::Lite
's autodispatch
mechanism to make the code a little easier to read:
use SOAP::Lite +autodispatch => uri => 'http://my.host.tld/WebSemDiff', proxy =>'http://my.host.tld/cgi-bin/semdiff.cgi', on_fault => \&fatal_error ; my $result = SOAP->compare( $file1, $file2 ); print "Comparing $f1 and $f2...\n"; # etc ..
Fans of the
REST Architecture will appreciate the fact that our application (and indeed, all
applications built using CGI::XMLApplication
) offer a the ability to
access the untransformed XML used to create the browser interface by including a
"passthru" parameter either in the query string of a GET
request,
or as a POST
ed field.
#!/usr/bin/perl -w use strict; use HTTP::Request::Common; use LWP::UserAgent; my ( $f1, $f2 ) = @ARGV; usage() unless defined $f1 and -f $f1 and defined $f2 and -f $f2; my $ua = LWP::UserAgent->new; my $uri = "http://my.host.tld/cgi-bin/semdiff.cgi"; my $req = HTTP::Request::Common::POST( $uri, Content_Type => 'form-data', Content => [ file1 => [ $f1 ], file2 => [ $f2 ], passthru => 1, semdiff_result => 1, ] ); my $result = $ua->request( $req ); if ( $result->is_success ) { print $result->content; } else { warn "Request Failure: " . $result->message . "\n"; } sub usage { die "Usage:\nperl $0 file1.xml file2.xml \n"; }
This script (restful_semdiff.pl
in the sample code) prints
the following XML document to STDOUT (formatted here for readability).
<?xml version="1.0" encoding="UTF-8"?> <document> <difference> <context>/root[1]/el1[1]</context> <message> Attribute 'el1attr' has different value in element 'el1'. </message> <startline>3</startline> <endline>3</endline> </difference> <difference> <context>/root[1]/el2[1]</context> <message> Character differences in element 'el2'. </message> <startline>4</startline> <endline>4</endline> </difference> ... </document>
Careful readers will have noticed that we did not touch on
XML-RPC at all in the column. There are two reasons for this.
First, the XML-RPC client and server interfaces provided by SOAP::Lite
are
nearly identical to those used for SOAP, so showing the example code would add little
value to the overall package. Second, unlike SOAP clients, XML-RPC clients have no standardized,
unambiguous HTTP header associated with their requests. This means that our CGI request broker
would have to resort to some level of voodoo to differentiate between XML-RPC clients and
regular Web browsers. Detecting XML-RPC requests might be possible by checking for a
combination of a POST
request and a Content-Type
of "text/xml",
but, at best, this solution seems brittle and naive and would only cloud the example code
(assuming it works at all). If you have a special expertise in this area and know of a more
robust way to detect requests from XML-RPC clients, then, please, share your knowledge by
posting a comment to this article.
We've covered a lot of ground this month and have glossed over a number of details in an effort to keep things focussed. The complete, working application and all client examples are available in the sample code if you need clarification.
Putting aside the debates about which architecture is best for implementing automated Web
services, or whether or not those services add anything new to Web technology, the bottom
line is: if you do the Web for a living, chances are good that you will be asked about
your knowledge of Web services. It is my sincerest hope that this introduction to
how SOAP::Lite
and CGI::XMLApplication
can be combined to create
clean, modular solutions that support access via SOAP, REST, and HTML browser will give
you a head start.