Enterprise Application Integration using Apache Cocoon 2.1

November 12, 2003

Apache Cocoon has typically been categorized as a web publishing framework, but since the release of version 2.1 is has started to look more like an XML application server.

I've just completed a project with a travel company to build a web-based travel agency desktop which integrates several common backend systems. These systems are ones that a typical agent would use in day to day business and were chosen to demonstrate a variety of integration techniques. In this article I outline how Cocoon 2.1 was the key to building this product, including both advantages and disadvantages.

The Problem Space

The travel industry is an old industry in IT terms. The core systems have been around a long time and are truly legacy systems. Even today there may still be the occasional dumb terminal plugged into an ALC (Airline Link Control) network. Most airlines and global distribution systems still have user interfaces based on a 64 by 15 character screen.

These days we are getting new interfaces to the core of these central reservation systems, usually based on proprietry XML APIs. Standards such as those evolving at the Open Travel Alliance will eventually provide for a much improved web service development space for both suppliers and clients alike.

The unfortunate reality is that the problem space we occupy today is a fairly horrendous mix of rapidly changing technologies. Any system that integrates the variety of supplier systems we are faced with is going to have to be an agile system. Give us any system, and we have to be able to talk to it and integrate it.

As a system architect my other objectives were to build a system which had a very fluid and incomplete requirements specification. All this with a small number of developers that had varying skillsets in both open source and Microsoft technologies. Extreme Programming seemed the only way forward in this environment. An XML foundation appeared to be the compromise to bridge the developers skills, since any developer would be able to maintain and develop any part of the system with minimal training.

As time went on, the more it appeared that "Everything is XML" was the mantra. Even dumb terminal feeds could be "XMLized".

The next step was answering the question: "If all my system feeds are in XML, what benefit do I get in coverting the XML into an object graph (as in Java code), and then back out into a user interface as HTML?" The answer caused some healthy debate amongst the developers, but in the end the answer was that there is no benefit.

We had to develop or find a framework that handled input and output in XML, had a suite of XML-aware components to handle authentication, form validation, and was easily extensible to call a variety of external interfaces. Cocoon appeared ideal in that it was based on XML Pipeline processing and we had the skillset to build extensions in Java.

Cocoon 2.1 was chosen because, although it was not even in alpha stage, the anticipated timeline would mean that any extensions we built would be based on the 2.1 infrastructure, not the old 2.0 infrastructure. This proved a good decision because our product is now in beta testing, and Cocoon 2.1 has just been released.

Five Second Introduction to Cocoon

There are a number of excellent articles on Cocoon basics, not to mention the Cocoon site itself. The samples that come with the Cocoon distribution are recommended and hint at the many possibilities. For the purposes of this article I will only introduce the basic components and the concepts of XML pipeline processing.

A Cocoon Web Application consists of a number of hierarchical sitemaps. Each sitemap consists of a number of Matchers (typically to match a url pattern) and each match can kick off the assembly of a Pipeline. A pipeline gets assembled by Action and Selector components. A pipeline, once assembled, is typically started with a Generator, followed by one or more Transfomers, and finally a Serializer.

The main components in Cocoon are

Matchers. Matches on some input; for example, the Wildcard URI Matcher, which matches on the URI delivered via a browser request.
Actions. Takes some action based on input parameters and results in success or failure. Typically takes on the role of the "C" in traditional MVC.
Selectors. Similar to actions, but allows muliple outcomes as in 'if else if else if...'
Generators. Most commonly the Request Generator, which takes a browser request (POST/GET) and generates XML as SAX events..
Transformers. XML SAX input is transformed to XML SAX output. The best example would be the XSLT Transfomer.
Serializers. Accepts SAX Events as input and serializes them to an output stream. For example the HTMLSerializer serializes to the Cocoon Servlet response output stream.
Readers. A reader ties the input stream directly to the output stream. Ideal for inputs that can't be readily XMLized, like JPGs.

The Systems

The Systems to be integrated were

a tour management system via a dumb terminal interface.
a global distribution system via a published XML API accessing air, car and hotel reservations.
an air fare system via its web site.
a travel insurance system via HTTP XML.
an authentication and customization Server via web service
a persistence engine, the 'data integrator' to provide the value added integration services eg. combined travel itineraries, customer relationship management -- via published web service.

The diagram summarizes these systems and their methods of access. Note Cocoon as the aggregator of these systems.

Cocoon Extensions

There were primarily three extensions that we required due to the fact they they didn't currently exist in Cocoon, or the components that did exist weren't ready for prime time.

These were

a custom form action extension including validation (using Jakarta Commons Validator);
a web service transformer which could deal with both basic HTTP XML POST operations and full blown SOAP calls;
an HTML transformer which could HTTP GET/POST to a regular web site and return the response as well-formed HTML.

Both the web service transformer and the HTML transformer had to maintain any remote session information transparently (as would a regular session through a browser using cookies).

Note that all the code for these extensions has previously been submitted to the Cocoon Developer mailing list.

The Big Issues

The big issues were "pipeline lock-in", debugging, standards drift, and performance.

"Pipeline lock-in" means that XML data flowing through a pipeline cannot influence the path through the pipeline. This isn't actually a limitation, rather a state of mind that has to be adopted. Imagine a pipeline that has to call two remote systems to achieve its end result. If the call to the first system fails (call 1) we require that no call to the second system be made, and that a suitable form of error handling be invoked, including returning an error message to the user.

...
  <map:generate type="request"/>
        <map:transform type="xalan"
src="prepareCallToHost1.xsl"/>
        <map:transform
type="webservice">
    <map:parameter name="uri" 
value="http://host1/service" />     <!-- call
1 -->
  </map:transform>
  
  <map:transform type="xalan" src="logger.xsl">
          <map:parameter name="file"
value="c:/tmp/call1.log" />     <!-- log
result of call
1 -->
  </map:transform>
        <map:transform type="xalan"
src="prepareCallToHost2.xsl"/>
        <map:transform type="webservice">
    <map:parameter name="uri" 
value="http://host2/service" />    
<!-- call 2 -->  </map:transform>
        <map:transform type="xalan"
src="result2html.xsl"/>
  <map:serialize type="html"/>
...

Thus in the above sample prepareCallToHost2.xsl would have to determine whether an error had occured in call 1. It would either prepare an agreed upon XML error structure without tags to trigger call 2 or prepare the XML for call 2. Similarly result2html.xsl would have to determine whether to render an error or to render a normal result to the user.

But surely it would be better to change the direction through the pipeline and have another component just after call 1 to say 'if success do endPipeWithA else do endPipeWithB'?

It takes a while to grasp, but think about how the XML flows through the pipeline. The key is in the SAX events. Each event is fired all the way through the chain before the next. If we were allowed to change direction in a pipeline, we would essentially break the chain (startElement doesn't match endElement scenarios). We would get halfway through an HTMLSerializer, and then we could change to a PDFSerializer; it just wouldn't make any sense.

The bottom line is that all possible error conditions must be accounted for at each stage in the pipeline and propagated through the pipeline.

Debugging is more difficult, primarily due to the lack of an IDE with breakpoints that you would get in an ordinary development environment. Each stage in a pipeline must be serialized back to the browser in XML for effective debugging. An alternative is to insert a logging type transfomer in the pipeline where a suspected error occurs and view the output that way. The above example logs the result of call 1 using the logger.xsl transform below.

<?xml version="1.0"?>
      
<!-- Logger Transform - Must use xalan! -->
      
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:lxslt="http://xml.apache.org/xslt"
xmlns:redirect="org.apache.xalan.xslt.extensions.Redirect"
extension-element-prefixes="redirect">
      
  <xsl:param name="file" select="'log.xml'"/>
      
  <xsl:template match="/">
  
    <redirect:open select="$file"/>
      
    <redirect:write select="$file">
       <xsl:copy-of select="."/>
    </redirect:write>
     
    <redirect:close select="$file"/>
      
    <xsl:apply-templates/>
  </xsl:template>
  
  <xsl:template match="node()|@*">
    <!-- Copy the current node -->
    <xsl:copy>
      <!-- Including any attributes it has
and any child nodes -->
      <xsl:apply-templates
select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
      
</xsl:stylesheet>

Hopefully we'll see some decent debugging tools emerging as Cocoon's popularity grows.

Standards Drift can occur simply because there's nothing to enforce the quality of the XML flowing throuh a pipeline. In addition we require best practice enforcement in the authorship of XSLT. Pair programming can certainly help here. One possibility would be schema enforcement at various stages in the pipelines, although this would be removed prior to release for performance reasons.

Performance can be a big issue. XML transforms are still slow, although the last few years have given us great leaps in efficiency. Cocoon has an advanced caching mechanism, but this doesn't help us much where most of the content is dynamic.

A few tidbits of advice:

Use IFrames to selectively repopulate a page instead of rebuilding entire pages each time.
Minimize the number of transforms by using input modules where possible.
Minimize XML volume through each pipeline by using aggregation.
Cocoon supports multiple XSLT types (xalan, xslt) and others will be added in the future, including Saxon (XSLT 2.0) and Gregor (very performant). It would be useful to experiment with different implementations.
Load test for scalability every couple of development iterations.
Identify indvidual bottlenecks and consider writing custom components to optimize.
Use Apache with mod_jk in front of your Cocoon servlet engine to cache content from Cocoon Readers and any other static content.

Development Methodology

Did I mention Extreme Programming? A tight specify/code/build/test/release cycle was vital for this project.

Use cases (or Story cards) were implemented alongside JMeter test scripts. Once a script run was successful it was incorporated into the Ant build script for the project. Wherever Java components were developed, unit tests were built and incorporated into the build. The JMeter scripts were used to enforce continuous quality control and were used for load testing as well as later system checks in the live system

Without this strict adherence to testing as part of the build process, the project would have failed due to lack of quality control. If a bug is introduced, it's far better to catch it quickly, while the developer's mind is still "in context". Needless to say, until our custom components were bedded down, we had a lot of pipeline breakages. Our testing regime ensured we kept on top of things.

Due to the wide variation in developer's skills, pair programming was used to cross train and enhance skillsets. Pair programming is the best training a programmer can get.

Looking Forward by Looking Back

This is always an important part of a project, enhancing your ability to see where you can improve, not only for the next iteration of the current project, but also for the next project.

What did we do wrong? What would be done differently next time around? How has the technology changed since we started?

We need to use more XML Schemas to enforce standardization. I believe that we made the right decision in not designing and enforcing schemas from day one, but now that the project has matured and entered maintenance phase it is the right time.

Our form handling mechanisms are totally proprietary simply because there were no satisfactory solutions in Cocoon. Cocoon 2.1 now incorporates Flow/Continuations, JXPath, and Woody. These are very interesting components that should be given serious consideration for any new development work.