Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Multi-Interface Web Services Made Easy
by Kip Hampton | Pages: 1, 2

SOAP::Lite's dispatch_to() method connects the SOAP plumbing to a given module (or directory of modules). In this case, it allows us to reuse the same WebSemDiff class that also implements the browser interface. Sharing that module means that the publicly visible CGI is nothing more than a request broker that provides access the methods in a single application class based on the type of client making the connection. Users accessing the application through a Web browser are prompted to upload two XML files and the posted data is run through the compare_as_dom() method to obtain the result while SOAP clients have direct access to compare_as_dom, as well as the lower-level compare(), and other methods.

Comment on this articleHave you developed any techniques for easily building web applications with multiple interfaces? Share your experience in our forum.
Post your comments

Now that we have a working (if not totally complete and sanity-checked) application, let's connect a few clients to it, compare two XML documents, and check out the results.

In the interest of clarity we will keep the documents being compared simple. We'll call the first doc1.xml

<?xml version="1.0"?>
<root>
  <el1 el1attr="good"/>
  <el2 el2attr="good">Some Text</el2>
  <el3/>
</root>

and the second, doc2.xml

<?xml version="1.0"?>
<root>
  <el1 el1attr="bad"/>
  <el2 bogus="true"/>
  <el4>Rogue</el4>
</root>

Access From Web Browser

A request to /cgi-bin/semdiff.cgi prompts the user to upload two documents:

screenshot

and after the files are compared, the results are given:

screenshot

Access From A SOAP Client

SOAP::Lite provides both a server and a client implementation. We will use it here to create the client that connects to the SOAP interface of our application. For brevity's sake we will skip over the parts of the client script that are concerned with argument processing, opening and reading the XML files to compared, and focus on the SOAP related parts. The complete script is available in this month's sample code as soap_semdiff1.pl.

#!/usr/bin/perl -w
use strict;
use SOAP::Lite;
 ...
my $soap = SOAP::Lite
  -> uri('http://my.host.tld/WebSemDiff')
  -> proxy('http://my.host.tld/cgi-bin/semdiff.cgi')
  -> on_fault( \&fatal_error );

my $result = $soap->compare( $file1, $file2 )->result;

print "Comparing $f1 and $f2...\n";

if ( defined $result and scalar( @{$result} ) == 0 ) {
    print "Files are semantically identical\n";
    exit;
}

foreach my $diff ( @{$result} ) {
print $diff->{context} . ' ' .
      $diff->{startline} . ' - '  .
      $diff->{endline} . ' '  .
      $diff->{message} .
      "\n";

}

Passing this script the paths to our two tiny XML documents produces the following result:

Comparing docs/doc1.xml and docs/doc2.xml...
/root[1]/el1[1] 3 - 3 Attribute 'el1attr' has different value in element 'el1'.
/root[1]/el2[1] 4 - 4 Character differences in element 'el2'.
/root[1]/el2[1] 4 - 4 Attribute 'el2attr' missing from element 'el2'.
/root[1]/el2[1] 4 - 4 Rogue attribute 'bogus' in element 'el2'.
/root[1] 5 - 5 Child element 'el3' missing from element '/root[1]'.
/root[1] 5 - 5 Rogue element 'el4' in element '/root[1]'.

As an alternative, we could use SOAP::Lite's autodispatch mechanism to make the code a little easier to read:

use SOAP::Lite +autodispatch =>
   uri      => 'http://my.host.tld/WebSemDiff',
   proxy    =>'http://my.host.tld/cgi-bin/semdiff.cgi',
   on_fault =>  \&fatal_error ;

my $result = SOAP->compare( $file1, $file2 );

print "Comparing $f1 and $f2...\n";

# etc ..

Access From A RESTful Client

Fans of the REST Architecture will appreciate the fact that our application (and indeed, all applications built using CGI::XMLApplication) offer a the ability to access the untransformed XML used to create the browser interface by including a "pass thru" parameter either in the query string of a GET request, or as a POSTed field.

#!/usr/bin/perl -w
use strict;
use HTTP::Request::Common;
use LWP::UserAgent;

my ( $f1, $f2 ) = @ARGV;

usage() unless defined $f1 and -f $f1
        and defined $f2 and -f $f2;


my $ua = LWP::UserAgent->new;
my $uri = "http://my.host.tld/cgi-bin/semdiff.cgi";


my $req = HTTP::Request::Common::POST( $uri,
                                       Content_Type => 'form-data',
                                       Content => [
                                           file1 => [ $f1 ],
                                           file2 => [ $f2 ],
                                           passthru => 1,
                                           semdiff_result => 1,
                                       ]
                                      );


my $result = $ua->request( $req );

if ( $result->is_success ) {
   print $result->content;
}
else {
   warn "Request Failure: " . $result->message . "\n";
}

sub usage {
   die "Usage:\nperl $0 file1.xml file2.xml \n";
}

This script (restful_semdiff.pl in the sample code) prints the following XML document to STDOUT (formatted here for readability).

<?xml version="1.0" encoding="UTF-8"?>
<document>
  <difference>
    <context>/root[1]/el1[1]</context>
    <message>
      Attribute 'el1attr' has different
      value in element 'el1'.
    </message>
    <startline>3</startline>
    <endline>3</endline>
  </difference>
  <difference>
    <context>/root[1]/el2[1]</context>
    <message>
      Character differences in element 'el2'.
    </message>
    <startline>4</startline>
    <endline>4</endline>
  </difference>
  ...
</document>

Conclusions

Also in Perl and XML

OSCON 2002 Perl and XML Review

XSH, An XML Editing Shell

PDF Presentations Using AxPoint

Perl and XML on the Command Line

Introducing XML::SAX::Machines, Part Two

Careful readers will have noticed that we did not touch on XML-RPC at all. There are two reasons. First, the XML-RPC client and server interfaces provided by SOAP::Lite are nearly identical to those used for SOAP, so showing the example code would add little value to the overall package. Second, unlike SOAP clients, XML-RPC clients have no standardized, unambiguous HTTP header associated with their requests. This means that our CGI request broker would have to resort to some level of voodoo to differentiate between XML-RPC clients and regular Web browsers. Detecting XML-RPC requests might be possible by checking for a combination of a POST request and a Content-Type of "text/xml", but, at best, this solution seems brittle and naive and would only cloud the example code (assuming it works at all). If you know a more robust way to detect requests from XML-RPC clients, please share your knowledge by posting a comment to this article.

We've covered a lot of ground this month and have glossed over a number of details in an effort to keep things focused. The complete, working application and all client examples are available in the sample code if you need clarification.

Putting aside the debates about which architecture is best for implementing automated Web services, or whether or not those services add anything new to Web technology, the bottom line is that if you do the Web for a living, chances are good that you will be asked about your knowledge of Web services. It is my sincere hope that this introduction to how SOAP::Lite and CGI::XMLApplication can be combined to create clean, modular solutions that support access via SOAP, REST, and HTML browser will give you a head start.

Resources


Comment on this articleHave you developed any techniques for easily building web applications with multiple interfaces? Share your experience in our forum.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • Security
    2002-05-09 03:32:10 Chris Morris [Reply]

    This code very elegantly passes user-supplied data to deeper and deeper levels, without any filtering.


    1) return $style_path . 'semdiff_' . $style . '.xsl';
    can be attacked with the double dot/poisoned null exploit..
    2) sub getXSLparameter passes user data to XSLT style sheets without examining it. Stylesheet authors are not used to looking at security issues - what if they use one of these parameters as the name of an included file?
    3) SOAP::Lite has a serious security hole, see
    http://www.phrack.com/show.php?p=58&a=9


    Is it responsible to publish code with so many security deficiencies? If your aim is to illustrate how to use some new technologies, you could at least put a comment in:
    #TODO validate data here


    Your fundamental point - how easy it is to wrap an existing service for remote access - is exactly the source of the security issue, which is "What facilities do we want to expose?". The big advantage of the REST paradigm is it invites you to begin by thinking about that.

    • Security
      2002-05-09 06:59:31 Kip Hampton [Reply]

      chrishmorris wrote:


      "return $style_path . 'semdiff_' . $style . '.xsl'; can be attacked with the double dot/poisoned null exploit.."


      Careful examination reveals that the $style_path property is set internally within the unexposed application class ( via $context->{style} ) and is at no time exposed to the world (or based on data passed in from the user).


      "sub getXSLparameter passes user data to XSLT style sheets without examining it. Stylesheet authors are not used to looking at security issues - what if they use one of these parameters as the name of an included file?"


      Putting aside your presumptons about what XSLT stylesheet authors typically do or think about, no external document can be included (accidently or otherwise) into a stylesheet via an <xsl:param/> element. Aslo, any key/value pairs passed to a stylesheet processor (which I'm sure your're aware is neither a script interpreter, nor able to call other executables) that are not explictly addressed in the stylesheet *by name* are ignored so, assuming that one did hack in a "mystery field" into the POST, it would have no effect whatsoever on the stylesheet transformation, or its result.


      That said though, yes, it is good practice, for production code to explicitly pass only that data to the XSLT processor that the stylesheet requires.


      "SOAP::Lite has a serious security hole, see http://www.phrack.com/show.php?p=58&a=9"


      Which is fixed in the current version (.55). See soaplite.com


      "If your aim is to illustrate how to use some new technologies, you could at least put a comment in:
      #TODO validate data here"


      You mean like: "Now that we have a working (if not totally complete and sanity-checked) application..." ? Maybe I need to dust off the <blink> tag?


      Yes, it is true, in this column I often do presume that the reader is smart enough to take the code samples as intended; that is, as merely illustrative of a specific concept and not something to be dropped as-is into production. I also realize that treating my readers as capable peers that do not require each jot and tittle to be pre-chewed for them puts me at odds some of the accepted conventions of technical writing.


      You do raise a very good point about security and "web services". Dillegence in this area key-- especially as more and more services become available.


      Thanks for reading, and for your comments.


      -kip

      • Middleware and software contracts
        2002-05-13 04:31:55 Chris Morris [Reply]

        Thank you for your thoughtful reply.


        You rightly object to my statement that "Stylesheet authors are not used to looking at security issues". It isn't a question of anyone's skills or character - it is a question of the division of responsibilities between modules.


        As you say, <xsl:include> does not accept parameters. However, document() and <xsl:document> do, and unfortunately some people work around the limitation in include by writing a stylesheet whose output is another stylesheet. So it seems likely that there will be some exploitable XSLT created.


        Your perl script knows that the parameters are user input, and the XSLT will know whether one of them is being used as a file path. One or the other has to clean the input. Since the perl is taking care of / hiding the network interface, it seems to me that it is responsible for security.


        I agree that $context->{style} can't be exploited in your script. But you are publishing model code here. The context object that CGI::XMLApplication uses has the power and risk of global variables. If someone decides for other reasons to copy the CGI parameters into $context - much like you did with the XSLT call - then this could create an exploit.


        Thinking about these issues has made me reconsider a script I recently wrote which attempts to match the CGI parameter names to SQL column names.


        I think the issue is one of middleware design, and goes wider than security. To make middleware powerful, we try to make it transparent. We look for a design that enforces few preconditions, and promises few postconditions. If we succeed, any design contract must be agreed between the outer layers of the sandwich. Unfortunately such contracts can get neglected. At least, the documentation of the middleware must point out to its clients which responsibilites still lie with them.


        It is good to hear that the problem in SOAP::Lite is fixed. I think it came about by starting with considerations of power, transparency, and elegance, instead of starting with the question "What do we want this component to do?".


        I agree that you couldn't cover everything in a short article ... I hope you agree that published code is fair game for criticism!