Enterprise Application Integration using Apache Cocoon 2.1
Apache Cocoon has typically been categorized as a web publishing framework, but since the release of version 2.1 is has started to look more like an XML application server.
I've just completed a project with a travel company to build a web-based travel agency desktop which integrates several common backend systems. These systems are ones that a typical agent would use in day to day business and were chosen to demonstrate a variety of integration techniques. In this article I outline how Cocoon 2.1 was the key to building this product, including both advantages and disadvantages.
The travel industry is an old industry in IT terms. The core systems have been around a long time and are truly legacy systems. Even today there may still be the occasional dumb terminal plugged into an ALC (Airline Link Control) network. Most airlines and global distribution systems still have user interfaces based on a 64 by 15 character screen.
These days we are getting new interfaces to the core of these central reservation systems, usually based on proprietry XML APIs. Standards such as those evolving at the Open Travel Alliance will eventually provide for a much improved web service development space for both suppliers and clients alike.
The unfortunate reality is that the problem space we occupy today is a fairly horrendous mix of rapidly changing technologies. Any system that integrates the variety of supplier systems we are faced with is going to have to be an agile system. Give us any system, and we have to be able to talk to it and integrate it.
As a system architect my other objectives were to build a system which had a very fluid and incomplete requirements specification. All this with a small number of developers that had varying skillsets in both open source and Microsoft technologies. Extreme Programming seemed the only way forward in this environment. An XML foundation appeared to be the compromise to bridge the developers skills, since any developer would be able to maintain and develop any part of the system with minimal training.
As time went on, the more it appeared that "Everything is XML" was the mantra. Even dumb terminal feeds could be "XMLized".
The next step was answering the question: "If all my system feeds are in XML, what benefit do I get in coverting the XML into an object graph (as in Java code), and then back out into a user interface as HTML?" The answer caused some healthy debate amongst the developers, but in the end the answer was that there is no benefit.
We had to develop or find a framework that handled input and output in XML, had a suite of XML-aware components to handle authentication, form validation, and was easily extensible to call a variety of external interfaces. Cocoon appeared ideal in that it was based on XML Pipeline processing and we had the skillset to build extensions in Java.
Cocoon 2.1 was chosen because, although it was not even in alpha stage, the anticipated timeline would mean that any extensions we built would be based on the 2.1 infrastructure, not the old 2.0 infrastructure. This proved a good decision because our product is now in beta testing, and Cocoon 2.1 has just been released.
There are a number of excellent articles on Cocoon basics, not to mention the Cocoon site itself. The samples that come with the Cocoon distribution are recommended and hint at the many possibilities. For the purposes of this article I will only introduce the basic components and the concepts of XML pipeline processing.
A Cocoon Web Application consists of a number of hierarchical sitemaps. Each sitemap consists of a number of Matchers (typically to match a url pattern) and each match can kick off the assembly of a Pipeline. A pipeline gets assembled by Action and Selector components. A pipeline, once assembled, is typically started with a Generator, followed by one or more Transfomers, and finally a Serializer.
The main components in Cocoon are
The Systems to be integrated were
The diagram summarizes these systems and their methods of access. Note Cocoon as the aggregator of these systems.
There were primarily three extensions that we required due to the fact they they didn't currently exist in Cocoon, or the components that did exist weren't ready for prime time.
These were
Both the web service transformer and the HTML transformer had to maintain any remote session information transparently (as would a regular session through a browser using cookies).
Note that all the code for these extensions has previously been submitted to the Cocoon Developer mailing list.
The big issues were "pipeline lock-in", debugging, standards drift, and performance.
"Pipeline lock-in" means that XML data flowing through a pipeline cannot influence the path through the pipeline. This isn't actually a limitation, rather a state of mind that has to be adopted. Imagine a pipeline that has to call two remote systems to achieve its end result. If the call to the first system fails (call 1) we require that no call to the second system be made, and that a suitable form of error handling be invoked, including returning an error message to the user.
...
<map:generate type="request"/>
<map:transform type="xalan"
src="prepareCallToHost1.xsl"/>
<map:transform
type="webservice">
<map:parameter name="uri"
value="http://host1/service" /> <!-- call
1 -->
</map:transform>
<map:transform type="xalan" src="logger.xsl">
<map:parameter name="file"
value="c:/tmp/call1.log" /> <!-- log
result of call
1 -->
</map:transform>
<map:transform type="xalan"
src="prepareCallToHost2.xsl"/>
<map:transform type="webservice">
<map:parameter name="uri"
value="http://host2/service" />
<!-- call 2 --> </map:transform>
<map:transform type="xalan"
src="result2html.xsl"/>
<map:serialize type="html"/>
...
Thus in the above sample prepareCallToHost2.xsl would
have to determine whether an error had occured in call 1. It would either
prepare an agreed upon XML error structure without tags to trigger call 2
or prepare the XML for call 2. Similarly result2html.xsl
would have to determine whether to render an error or to render a normal
result to the user.
But surely it would be better to change the direction through the pipeline and have another component just after call 1 to say 'if success do endPipeWithA else do endPipeWithB'?
It takes a while to grasp, but think about how the XML flows through the pipeline. The key is in the SAX events. Each event is fired all the way through the chain before the next. If we were allowed to change direction in a pipeline, we would essentially break the chain (startElement doesn't match endElement scenarios). We would get halfway through an HTMLSerializer, and then we could change to a PDFSerializer; it just wouldn't make any sense.
The bottom line is that all possible error conditions must be accounted for at each stage in the pipeline and propagated through the pipeline.
Debugging is more difficult, primarily due to the lack of
an IDE with breakpoints that you would get in an ordinary development
environment. Each stage in a pipeline must be serialized back to the
browser in XML for effective debugging. An alternative is to insert a
logging type transfomer in the pipeline where a suspected error occurs and
view the output that way. The above example logs the result of call 1
using the logger.xsl transform below.
<?xml version="1.0"?>
<!-- Logger Transform - Must use xalan! -->
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:lxslt="http://xml.apache.org/xslt"
xmlns:redirect="org.apache.xalan.xslt.extensions.Redirect"
extension-element-prefixes="redirect">
<xsl:param name="file" select="'log.xml'"/>
<xsl:template match="/">
<redirect:open select="$file"/>
<redirect:write select="$file">
<xsl:copy-of select="."/>
</redirect:write>
<redirect:close select="$file"/>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="node()|@*">
<!-- Copy the current node -->
<xsl:copy>
<!-- Including any attributes it has
and any child nodes -->
<xsl:apply-templates
select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Hopefully we'll see some decent debugging tools emerging as Cocoon's popularity grows.
Standards Drift can occur simply because there's nothing to enforce the quality of the XML flowing throuh a pipeline. In addition we require best practice enforcement in the authorship of XSLT. Pair programming can certainly help here. One possibility would be schema enforcement at various stages in the pipelines, although this would be removed prior to release for performance reasons.
Performance can be a big issue. XML transforms are still slow, although the last few years have given us great leaps in efficiency. Cocoon has an advanced caching mechanism, but this doesn't help us much where most of the content is dynamic.
A few tidbits of advice:
Did I mention Extreme Programming? A tight specify/code/build/test/release cycle was vital for this project.
Use cases (or Story cards) were implemented alongside JMeter test scripts. Once a script run was successful it was incorporated into the Ant build script for the project. Wherever Java components were developed, unit tests were built and incorporated into the build. The JMeter scripts were used to enforce continuous quality control and were used for load testing as well as later system checks in the live system
Without this strict adherence to testing as part of the build process, the project would have failed due to lack of quality control. If a bug is introduced, it's far better to catch it quickly, while the developer's mind is still "in context". Needless to say, until our custom components were bedded down, we had a lot of pipeline breakages. Our testing regime ensured we kept on top of things.
Due to the wide variation in developer's skills, pair programming was used to cross train and enhance skillsets. Pair programming is the best training a programmer can get.
This is always an important part of a project, enhancing your ability to see where you can improve, not only for the next iteration of the current project, but also for the next project.
What did we do wrong? What would be done differently next time around? How has the technology changed since we started?
We need to use more XML Schemas to enforce standardization. I believe that we made the right decision in not designing and enforcing schemas from day one, but now that the project has matured and entered maintenance phase it is the right time.
Our form handling mechanisms are totally proprietary simply because there were no satisfactory solutions in Cocoon. Cocoon 2.1 now incorporates Flow/Continuations, JXPath, and Woody. These are very interesting components that should be given serious consideration for any new development work.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.