Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Enterprise Application Integration using Apache Cocoon 2.1

by Tony Culshaw
November 12, 2003

Apache Cocoon has typically been categorized as a web publishing framework, but since the release of version 2.1 is has started to look more like an XML application server.

I've just completed a project with a travel company to build a web-based travel agency desktop which integrates several common backend systems. These systems are ones that a typical agent would use in day to day business and were chosen to demonstrate a variety of integration techniques. In this article I outline how Cocoon 2.1 was the key to building this product, including both advantages and disadvantages.

The Problem Space

The travel industry is an old industry in IT terms. The core systems have been around a long time and are truly legacy systems. Even today there may still be the occasional dumb terminal plugged into an ALC (Airline Link Control) network. Most airlines and global distribution systems still have user interfaces based on a 64 by 15 character screen.

These days we are getting new interfaces to the core of these central reservation systems, usually based on proprietry XML APIs. Standards such as those evolving at the Open Travel Alliance will eventually provide for a much improved web service development space for both suppliers and clients alike.

The unfortunate reality is that the problem space we occupy today is a fairly horrendous mix of rapidly changing technologies. Any system that integrates the variety of supplier systems we are faced with is going to have to be an agile system. Give us any system, and we have to be able to talk to it and integrate it.

As a system architect my other objectives were to build a system which had a very fluid and incomplete requirements specification. All this with a small number of developers that had varying skillsets in both open source and Microsoft technologies. Extreme Programming seemed the only way forward in this environment. An XML foundation appeared to be the compromise to bridge the developers skills, since any developer would be able to maintain and develop any part of the system with minimal training.

As time went on, the more it appeared that "Everything is XML" was the mantra. Even dumb terminal feeds could be "XMLized".

The next step was answering the question: "If all my system feeds are in XML, what benefit do I get in coverting the XML into an object graph (as in Java code), and then back out into a user interface as HTML?" The answer caused some healthy debate amongst the developers, but in the end the answer was that there is no benefit.

We had to develop or find a framework that handled input and output in XML, had a suite of XML-aware components to handle authentication, form validation, and was easily extensible to call a variety of external interfaces. Cocoon appeared ideal in that it was based on XML Pipeline processing and we had the skillset to build extensions in Java.

Cocoon 2.1 was chosen because, although it was not even in alpha stage, the anticipated timeline would mean that any extensions we built would be based on the 2.1 infrastructure, not the old 2.0 infrastructure. This proved a good decision because our product is now in beta testing, and Cocoon 2.1 has just been released.

Five Second Introduction to Cocoon

There are a number of excellent articles on Cocoon basics, not to mention the Cocoon site itself. The samples that come with the Cocoon distribution are recommended and hint at the many possibilities. For the purposes of this article I will only introduce the basic components and the concepts of XML pipeline processing.

A Cocoon Web Application consists of a number of hierarchical sitemaps. Each sitemap consists of a number of Matchers (typically to match a url pattern) and each match can kick off the assembly of a Pipeline. A pipeline gets assembled by Action and Selector components. A pipeline, once assembled, is typically started with a Generator, followed by one or more Transfomers, and finally a Serializer.

Air Availability Example

The main components in Cocoon are

  • Matchers. Matches on some input; for example, the Wildcard URI Matcher, which matches on the URI delivered via a browser request.
  • Actions. Takes some action based on input parameters and results in success or failure. Typically takes on the role of the "C" in traditional MVC.
  • Selectors. Similar to actions, but allows muliple outcomes as in 'if else if else if...'
  • Generators. Most commonly the Request Generator, which takes a browser request (POST/GET) and generates XML as SAX events..
  • Transformers. XML SAX input is transformed to XML SAX output. The best example would be the XSLT Transfomer.
  • Serializers. Accepts SAX Events as input and serializes them to an output stream. For example the HTMLSerializer serializes to the Cocoon Servlet response output stream.
  • Readers. A reader ties the input stream directly to the output stream. Ideal for inputs that can't be readily XMLized, like JPGs.

The Systems

The Systems to be integrated were

  • a tour management system via a dumb terminal interface.
  • a global distribution system via a published XML API accessing air, car and hotel reservations.
  • an air fare system via its web site.
  • a travel insurance system via HTTP XML.
  • an authentication and customization Server via web service
  • a persistence engine, the 'data integrator' to provide the value added integration services eg. combined travel itineraries, customer relationship management -- via published web service.

The diagram summarizes these systems and their methods of access. Note Cocoon as the aggregator of these systems.

System Summary

Cocoon Extensions

There were primarily three extensions that we required due to the fact they they didn't currently exist in Cocoon, or the components that did exist weren't ready for prime time.

These were

  • a custom form action extension including validation (using Jakarta Commons Validator);
  • a web service transformer which could deal with both basic HTTP XML POST operations and full blown SOAP calls;
  • an HTML transformer which could HTTP GET/POST to a regular web site and return the response as well-formed HTML.

Both the web service transformer and the HTML transformer had to maintain any remote session information transparently (as would a regular session through a browser using cookies).

Note that all the code for these extensions has previously been submitted to the Cocoon Developer mailing list.

The Big Issues

The big issues were "pipeline lock-in", debugging, standards drift, and performance.

"Pipeline lock-in" means that XML data flowing through a pipeline cannot influence the path through the pipeline. This isn't actually a limitation, rather a state of mind that has to be adopted. Imagine a pipeline that has to call two remote systems to achieve its end result. If the call to the first system fails (call 1) we require that no call to the second system be made, and that a suitable form of error handling be invoked, including returning an error message to the user.

...
  <map:generate type="request"/>
        <map:transform type="xalan"
src="prepareCallToHost1.xsl"/>
        <map:transform
type="webservice">
    <map:parameter name="uri" 
value="http://host1/service" />     <!-- call
1 -->
  </map:transform>
  
  <map:transform type="xalan" src="logger.xsl">
          <map:parameter name="file"
value="c:/tmp/call1.log" />     <!-- log
result of call
1 -->
  </map:transform>
        <map:transform type="xalan"
src="prepareCallToHost2.xsl"/>
        <map:transform type="webservice">
    <map:parameter name="uri" 
value="http://host2/service" />    
<!-- call 2 -->  </map:transform>
        <map:transform type="xalan"
src="result2html.xsl"/>
  <map:serialize type="html"/>
...

Thus in the above sample prepareCallToHost2.xsl would have to determine whether an error had occured in call 1. It would either prepare an agreed upon XML error structure without tags to trigger call 2 or prepare the XML for call 2. Similarly result2html.xsl would have to determine whether to render an error or to render a normal result to the user.

But surely it would be better to change the direction through the pipeline and have another component just after call 1 to say 'if success do endPipeWithA else do endPipeWithB'?

It takes a while to grasp, but think about how the XML flows through the pipeline. The key is in the SAX events. Each event is fired all the way through the chain before the next. If we were allowed to change direction in a pipeline, we would essentially break the chain (startElement doesn't match endElement scenarios). We would get halfway through an HTMLSerializer, and then we could change to a PDFSerializer; it just wouldn't make any sense.

The bottom line is that all possible error conditions must be accounted for at each stage in the pipeline and propagated through the pipeline.

Debugging is more difficult, primarily due to the lack of an IDE with breakpoints that you would get in an ordinary development environment. Each stage in a pipeline must be serialized back to the browser in XML for effective debugging. An alternative is to insert a logging type transfomer in the pipeline where a suspected error occurs and view the output that way. The above example logs the result of call 1 using the logger.xsl transform below.

<?xml version="1.0"?>
      
<!-- Logger Transform - Must use xalan! -->
      
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:lxslt="http://xml.apache.org/xslt"
xmlns:redirect="org.apache.xalan.xslt.extensions.Redirect"
extension-element-prefixes="redirect">
      
  <xsl:param name="file" select="'log.xml'"/>
      
  <xsl:template match="/">
  
    <redirect:open select="$file"/>
      
    <redirect:write select="$file">
       <xsl:copy-of select="."/>
    </redirect:write>
     
    <redirect:close select="$file"/>
      
    <xsl:apply-templates/>
  </xsl:template>
  
  <xsl:template match="node()|@*">
    <!-- Copy the current node -->
    <xsl:copy>
      <!-- Including any attributes it has
and any child nodes -->
      <xsl:apply-templates
select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
      
</xsl:stylesheet>

Hopefully we'll see some decent debugging tools emerging as Cocoon's popularity grows.

Standards Drift can occur simply because there's nothing to enforce the quality of the XML flowing throuh a pipeline. In addition we require best practice enforcement in the authorship of XSLT. Pair programming can certainly help here. One possibility would be schema enforcement at various stages in the pipelines, although this would be removed prior to release for performance reasons.

Performance can be a big issue. XML transforms are still slow, although the last few years have given us great leaps in efficiency. Cocoon has an advanced caching mechanism, but this doesn't help us much where most of the content is dynamic.

A few tidbits of advice:

  • Use IFrames to selectively repopulate a page instead of rebuilding entire pages each time.
  • Minimize the number of transforms by using input modules where possible.
  • Minimize XML volume through each pipeline by using aggregation.
  • Cocoon supports multiple XSLT types (xalan, xslt) and others will be added in the future, including Saxon (XSLT 2.0) and Gregor (very performant). It would be useful to experiment with different implementations.
  • Load test for scalability every couple of development iterations.
  • Identify indvidual bottlenecks and consider writing custom components to optimize.
  • Use Apache with mod_jk in front of your Cocoon servlet engine to cache content from Cocoon Readers and any other static content.

Development Methodology

Related Reading

Extreme Programming Pocket Guide

Extreme Programming Pocket Guide
By chromatic 

Did I mention Extreme Programming? A tight specify/code/build/test/release cycle was vital for this project.

Use cases (or Story cards) were implemented alongside JMeter test scripts. Once a script run was successful it was incorporated into the Ant build script for the project. Wherever Java components were developed, unit tests were built and incorporated into the build. The JMeter scripts were used to enforce continuous quality control and were used for load testing as well as later system checks in the live system

Without this strict adherence to testing as part of the build process, the project would have failed due to lack of quality control. If a bug is introduced, it's far better to catch it quickly, while the developer's mind is still "in context". Needless to say, until our custom components were bedded down, we had a lot of pipeline breakages. Our testing regime ensured we kept on top of things.

Due to the wide variation in developer's skills, pair programming was used to cross train and enhance skillsets. Pair programming is the best training a programmer can get.

Looking Forward by Looking Back

This is always an important part of a project, enhancing your ability to see where you can improve, not only for the next iteration of the current project, but also for the next project.

What did we do wrong? What would be done differently next time around? How has the technology changed since we started?

We need to use more XML Schemas to enforce standardization. I believe that we made the right decision in not designing and enforcing schemas from day one, but now that the project has matured and entered maintenance phase it is the right time.

Our form handling mechanisms are totally proprietary simply because there were no satisfactory solutions in Cocoon. Cocoon 2.1 now incorporates Flow/Continuations, JXPath, and Woody. These are very interesting components that should be given serious consideration for any new development work.


Comment on this articleShare your opinions and experience in our forum.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • XML to object graph
    2003-11-24 06:35:31 Taylor Cowan [Reply]

    -What benefit do I get in coverting the XML into an object graph?


    If the availability list requires filtering, and the rules for filtering the availability list change or grow over time, an object graph would be useful. Items within the avail list may have properties that are sent to and evalutated by other systems. Avail list info may need to be appended. XSLT has the capacity to perform program logic. With a 100% xml/xslt pipeline, logic creeps into the XSLT and they become difficult to change over time. It all depends on how happy the developers are in editing and modifying style sheets as opposed to java code.


    The other option is to filter before transforming into XML.


    Taylor

  • Conditionals
    2003-11-21 15:40:31 Erik Bruchez [Reply]

    This is not a bad article at all, but I thought I would comment on one
    point.


    "Pipeline lock-in", as the author says, is just a way of saying that
    Cocoon pipelines do not support conditionals, i.e. they do not allow
    you to take a different branch of execution based on the results of a
    previous step in the pipeline. If this is "by design", as I have read
    and heard a few times by now, it is a design that is hard to
    defend. The author's argument that Cocoon is SAX-based does not hold
    water either: consider that some components in a Cocoon pipeline do
    not support pure streaming, starting with XSLT. Xalan can be put in a
    mode where it starts outputing SAX events before reading its input is
    completed, but typically will not happen: in many cases, an XSLT
    transformation needs to have read the entire input document before
    being able to generate most of its output. The bottom line is that
    using conditionals in a SAX pipeline may impact streaming in different
    ways, yes, but so does using XSLT!


    This is how you would implement a condition in XPL, the OXF
    (http://www.orbeon.com/oxf/) pipeline language:


    <p:config xmlns:p="http://www.orbeon.com/oxf/pipeline">
    <!-- First transformation -->
    <p:processor uri="oxf/processor/xslt">
    <p:input name="data" href="foo.xml"/>
    <p:input name="config" href="foo.xsl"/>
    <p:output name="data" id="result"/>
    </p:processor>
    <p:choose href="#result">
    <p:when test="/error"><!-- This is an XPath expression -->
    <!-- Execute part of pipeline when there was an error -->
    ...
    </p:when>
    <p:otherwise>
    <!-- Execute part of pipeline when there was no error -->
    ...
    </p:otherwise>
    </p:choose>
    </p:config>


    The problems the author encountered with the limitations of Cocoon
    pipelines are I think one of the major reasons so many people have
    been (and will be for a long time) frustrated with Cocoon.


    -Erik


  • Scalability, Maintainability
    2003-11-16 01:39:33 Bernd Hofner [Reply]

    Nice article and nice to see what cocoon can do!


    It would be interesting to see how your architectures scales. Can you disclose how many transactions/hour are handled by the system?
    How much time is spend in the cocoon part in relation to the back-end calls?
    And what kind of hardware infrastructure do you need to support your software architecture?


    Could you name how much of the actual processing logic has been done in XSLT vs. Java? Do you think that the excessive use of XSLT provides for a maintainable system?


    Personally, I do get along using XSLT but I still feel not to comfortable using it for complex tasks. Sometimes the pattern matching approach of XSLT makes it difficult to determine what really happens (which leads to a unsatisfying try-and-error development).


    In comparision to modern programming languages XML/XSLT seems to lack mechanisms to organize complexity and reuse. What happend to the praised object-oriented concepts like inheritance, interfaces, modules, information hiding and so forth? Are they no longer important?


    Or am I overestimating the logic needed for a personalized front-end?





    • Scalability, Maintainability
      2003-11-19 02:05:08 Tony Culshaw [Reply]

      Glad you enjoyed the article, sorry I took so long to reply.


      A high end wintel with plenty of memory should handle up to 30ish trans/sec with Apache Web Server in front of the servlet engine. There's no doubt that XSLT and the pipeline model cost more in terms of processing power, but then the same can be said about java compared to C/C++, or C/C++ compared to assembler!


      Spend more on hardware and have an elegant solution rather than cost more in time and developer pain.


      Regarding your excessive use of xslt ... in theory you can do the whole thing in XSLT, but I would call that excessive. It is however interesting to see how far you can take it. Our main business logic and persistence engine is all J2EE/JBoss.


      The middle of the road approach is to use whatever framework gets the job done. If we all believed in pure OO development why aren't we all using Object Databases instead of our old clunky Relational Databases?


      My final comment regards 'try-and-error' development. I have to admit that all my development is in fact 'try-and-error'!


  • Pipeline Debugging
    2003-11-13 12:41:21 Tony Collen [Reply]

    ... or the author could use views:


    http://cocoon.apache.org/2.1/userdocs/concepts/views.html


    Tony

  • For debugging, try the profiler block
    2003-11-13 05:08:16 Bruno Dumon [Reply]

    Subject says it all. Instead of modifying your pipeline in order to view the XML between transformation steps, the profiler block can record this XML for you automatically.


    Check out the samples for the profiler, and see the documentation over here:
    http://cocoon.apache.org/2.1/userdocs/concepts/profiler.html