Menu

Getting Started With Cocoon 2

July 10, 2002

Steve Punte

Introduction

Cocoon 2, part of the Apache XML Project, is a highly flexible web publishing framework built from reusable components. Although reusability is an oft-touted quality of software frameworks, Cocoon stands out because of the simplicity of the interface between the components. Cocoon 2 uses XML documents, via SAX, as its intercomponent API. As long as a component accepts and emits XML, it works.

The purpose of this article is to provide an overview of Cocoon 2's functionality and to get you started writing small applications using it.

What is Cocoon 2?

Cocoon 2 is an XML publishing framework. What does that mean? It is neither a database that stores XML content, nor a J2EE application server that provides web server facilities to serve the content. Instead Cocoon 2 fits architecturally between these two layers. It is framework for processing content. The processing of content is achieved by an assembly line or pipeline of components. These assembly lines are defined by the designer.

Simple Examples

Let's start with the easy case. A document written in XML is stored in a file (file.xml), processed by an XSL stylesheet (stylesheet.xsl), and then served up as HTML. A Cocoon pipeline suitable for this task is shown in figure 1.

Figure of three stage  pipeline

All pipelines begin with a generator. In figure 1, the generator reads a file from the file system and turns it into an XML SAX stream. The middle component, in this case an XSL transformer, applies the HTML presentation tags, accepting an XML stream and emitting one, too. Finally the end component, a serializer, terminates the stream and outputs the contents in HTTP format. This three-stage pipeline applies to as many or all of the pages in a given site as defined by the user. This example may seem trivial in that two out of the three components have to do with starting and ending the pipeline, but it illustrates the simplest situation.

Figure of four stage pipeline

Figure 2 depicts a more typical situation. Pages contain both static content and dynamic content obtained from a database. The new component introduced here is the SQL Transformer. SQL statements embedded in the original XML document are processed and replaced with an XML result set tree fragment. For example, if the source content document (i.e. file.xml) contains:


<guest-list>

  <sql:execute-query>

    <sql:query>

      SELECT CONCAT(lastName, ', ', firstName) as name, age

          FROM guest WHERE status = ARRIVING;

    </sql:query>

  </sql:execute-query>

</guest-list>

Then a possible document coming out of the SQL Transformer would be


<guest-list>

  <row-set>

    <row>

      <name>Bush, George</name>

      <age>56</age>

    </row>

    <row>

      <name>Jackson, Michael</name>

      <age>42</age>

    </row>

    <row>

      <name>Einstein, Albert</name>

      <age>105</age>

    </row>

  </row-set>

</guest-list>

The key architectural advantage here is that the source file becomes a very condensed business logic document. We are neither concerned nor have to deal with the JDBC API. Instead, the starting document content has become the business problem at hand.

Finally, suppose a local database contains a list of stock symbols that we wish to obtain the current market prices. This page could be part of a portal like Yahoo. The business problem could be solved with the multi-component pipleline show in figure 3 below:

Figure of six stage pipeline

The XML input fragment to the SOAP transformer my look something like


<soap:query url="http://www.mystock.org:8080">

  <soap:body>

    <getStockPrice>

      <stockName>IBM</stockName>

      <stockName>HWQ</stockName>

      <stockName>BEA</sockName>

    </getStockPrice>

  </soap:body>

</soap:query>

For this example note that an intermediate XSL transformer is used to prepare the SQL transformer output to the exact format required by the SOAP transformer. XSL can be used for a wide variety of tasks far beyond that of HTML presentation.

Under the Hood

As seen above, there are three fundamental components to Cocoon 2: First, generators are responsible for creating an XML SAX stream. This stream can begin from a file on the local file system, a blob of XML in a database, externally from another system, or elsewhere. Second, transformers are responsible for modifying an XML stream. These can be XSL, SQL, SOAP, LDAP, or custom. The primary requirement is to accept in an XML stream and emit an XML stream. And, finally, serializers are responsible for terminating an XML stream and emitting the content in a suitable format. This typically is HTTP, but it can be a graphics format, writing to a file system, or practically anything else.

The Sitemap

Cocoon 2 pipelines are defined by the sitemap, which is the file sitemap.xmap located at the root of the web application. The key sitemap fragment for the example in Figure 3 would be


<map:match="*.html">

  <map:generate src="file.xml"/>

  <map:transformer type="sql">

    <map:parameter name="use-connection" value="mydatabase"/>

  </map:transformer>

  <map:transformer type="xsl" src="format-adjust.xsl"/>

  <map:transformer type="soap" url="soap://www.stockquote.com"/>

  <map:transformer type="xsl" src="html-presentation.xsl"/>

  <map:serializer type="html"/>

</map:match>

Installation

Open source and production worthiness

Apache Cocoon is an open source project. What does this mean if you are seriously considering this platform?

  • State-of-the-art: Cocoon 2 is leading-edge. Technology-wise your project will be very well suited for moving forward in the 21st century, instead of rooted in older paradigms.
  • Production worthiness: Cocoon is neither tested nor qualified to the degree that a commercial product like WebLogic or Oracle DB is. Instead, any project should consider that part of their QA process is reconfirming the operation and performance of Cocoon. This aspect should not be much of a deal-killer. After all, Cocoon is certainly better used and tested than the custom software related to any one particular application project.
  • Stability: Unlike commercial software, the Cocoon project has been known to let go of the past. Significant architectural changes occurred from Cocoon 1 to Cocoon 2. Fundamentally, this is what it is all about. A project adopting Cocoon should anticipate and allocate some resources to address architectural paradigm shifts when upgrading to later releases.

The three main software packages needed to get Cocoon up and running are as follows.

  1. JDK. I recommend at least JDK 1.3.1, downloadable from Sun at http://java.sun.com/j2se/1.3.
  2. A J2EE servlet container. Apache Tomcat is very popular, but my favorite is the Orion Web Server. It is a very modern pure-Java solution that can be obtained at http://www.orionserver.com. As a side note, Oracle is using this software in the core of their new 9ias product line.
  3. The Cocoon binaries. These can be obtained at http://xml.apache.org/cocoon/dist. I recommend version 2.0.1, as version 2.0.2 has some problems with the examples. Hopefully these will soon be fixed in release 2.0.3.

The Cocoon web page has detailed instructions on how to set up and configure Cocoon on a wide variety of servlet containers such as Tomcat, WebLogic, Jrun, Jboss, Resin, etc.

I recommend use of the Orion web server because of simplicity of installation and easy portability across operating systems. Detailed tested instructions for setting up and getting started can be found on my web site. It is particularity effective to test the installation at several partial completion points as recommended in the instructions. In this manner problems can be more readily identified as either application-server-related or Cocoon-related. In the near future new instructions will be upgraded for the use of JDK 1.4.

Once the software is installed and configured, direct your web browser to http://localhost, http://localhost:8080, or whatever is appropriate to your configuration. Since Cocoon is configured by default to be loaded on demand, there will be a delay here for the initialization process, and then a page with Cocoon examples will appear.

The best place to start is by examining the very simple HTML "Hello World" example. There are three interesting elements to this example. The first is the portion of the sitemap that is responsible for this example. Search file sitemap.xmap for <map:match pattern="hello.html"> and you will find the matching pipeline rule:


   <map:match pattern="hello.html">

    <map:generate src="docs/samples/hello-page.xml"/>

    <map:transform src="stylesheets/page/simple-page2html.xsl"/>

    <map:serialize type="html"/>

   </map:match>

The second element, seen by examining this matching rule, is the source XML file found at docs/samples/hello-page.xml. And finally a quick examination of the stylesheet should be performed which can be found at stylesheets/page/simple-page2html.xsl.

Use this same process to evaluate all the Cocoon other examples. First identify the responsible matching rule in the sitemap file, and then examine the individual elements that compose the rule. There will normally be a generator component at the beginning of the pipeline, a serializer the end of the pipeline, and some number of transformer components in between.

Your first Cocoon Page

To demonstrate some of the possibilities and capabilities of Cocoon, let's create a pipeline that expects a numeric value submitted via HTTP POST and then calculates the factorial. This example demonstrates how sources other than files are possible, and how a pipeline and XSL transformer can perform more interesting tasks than simply HTML tag additions.

This pipeline has three components:

  • The standard HTTP Request generator that converts all available HTTP request elements such as parameter-value pairs, host name, etc into XML.
  • An XSL stylesheet to pick out the particular numeric field and calculate the factorial.
  • And finally a serializer to convert the SAX event stream into an HTTP format. In this case we are leaving the output results in XML.

The sitemap pipeline matching rule is shown below. Note that two lines have been broken for readability.


  <map:match pattern="mypage">

    <map:generate type="request"/>

    <map:transform src="mystylesheet.xsl"/>

    <map:serialize type="xml"/>

  </map:match>

A stylesheet to meet this objective might look like this:


<?xml version="1.0"?>

<!-- Author: Steven P. Punte "stevep@candlelightsoftware.com" -->

<!-- Description:  Computes Factorial -->



<xsl:stylesheet version="1.0"

  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

  xmlns:http="http://xml.apache.org/cocoon/requestgenerator/2.0">



  <xsl:template match="/">

    <page>

      <incoming-value>

        <xsl:value-of select="/http:request/http:requestParameters/

http:parameter/http:value"/>

      </incoming-value>

      <computed-factorial>

        <xsl:call-template name="factorial">

          <xsl:with-param name="input" select="/http:request/

http:requestParameters/http:parameter/http:value"/>

        </xsl:call-template>

      </computed-factorial>

    </page>

  </xsl:template>



 <xsl:template name="factorial">

    <xsl:param name="input"/>

    <xsl:choose>

     <xsl:when test="$input > 1">

        <xsl:variable name="tmp">

         <xsl:call-template name="factorial">

           <xsl:with-param name="input" select="$input - 1" />

          </xsl:call-template>

       </xsl:variable>

        <xsl:value-of select="$tmp * $input"/>

      </xsl:when>

      <xsl:otherwise>

        1

      </xsl:otherwise>

   </xsl:choose>

  </xsl:template>



</xsl:stylesheet>

Once installed, point a browser to this page with an input parameter. For example,

http://localhost/mypage?input=5

And the resulting output on your browser should be:


  <?xml version="1.0" encoding="UTF-8" ?>

  <page>

    <incoming-value>5</incoming-value>

    <computed-factorial>120</computed-factorial>

  </page>

Summary

Cocoon implements the processing pipelines concept. Existing Cocoon components can access relational and XML databases, interact with LDAP, and generate graphics. I expect new components to be able to generate and receive SOAP and ebXML messages, SNMP protocol serializers, provide COM and EJB/RMI bridges, and in general take on more task related to enterprise business logic.

Cocoon is an example of XML directed software, an architecture where the business domain knowledge is capture primarily in the form of XML, and this XML drives generic procedural software. The XSL and SQL transformer components are clear examples. Software authors only need to express their software solutions in XML languages and are able to reuse off the shell components.

Related links