Introducing Cocoon 2.0

February 13, 2002

A Short History of Apache Cocoon

It took two years, but we finally released Apache Cocoon, the second generation. Cocoon started simply enough. In 1998 Jon Stevens -- of Apache JServ, Turbine, Velocity, Anakia, and Tigris Scarab fame -- and I created scripts that managed the automatic update of the java.apache.org site. The scripts were dead simple: iterate over all the CVS modules that java.apache.org had under the /docs and copy them to the right place.

The problem was that people were continously messing up the docs. Few people want to write documentation for open source projects; when they do, you thank them and don't complain about coherence of style and stuff like that. Or you won't have any docs at all.

The solution was obvious: we needed a way to separate style from content. In late 1998 the first XSL working draft was released and IBM made a Java XSL processor, LotusXSL, available. I downloaded both and started to play around with what was later called XSLT. While playing with this stuff, I quickly grew tired of typing a command line, moving to the browser to see the result, over and over. I wanted a less tedious change-transform-reload cycle.

So I wrote a servlet that handled the tedious bits for me; I could modify the stylesheet, hit reload on the browser, and the servlet would handle everything. This was at the very end of 1998 and Ron Howard's movie Cocoon was playing on the television, which explains the weird name only partially. I believed at the time that these technologies were a key part of the future of the Web, so a cocoon was just what was needed to allow them to incubate and grow stronger.

Apache Cocoon 1.0 was a servlet, about 100 lines of code, that used XML4J (later Apache Xerces) and LotusXSL (later Apache Xalan) to transform an XML file with an XSL stylesheet. At that time, XSLT, XPath and XSL:FO were still part of one big spec. I didn't think it was very useful for anyone else so I kept it on my disk for a few months. Then, around March 1999, on the jserv-dev mail list somebody was asking about XSL, and I said that I'd written a servlet that did all that transformation on the server side. Many people asked for it, so I requested a formal vote and the Apache Cocoon project was started under the java.apache.org umbrella.

The 1.0 version contained very little code, but lots of examples and some simple docs that explained what XSL was and why I thought it was important to learn it. After its release, people started joining active development, and we turned a small servlet into a full XML-based publishing system, which is now used in many production sites around the world.

But Cocoon 1.x was designed when the XML world was very young and experience was very small and it was based under several design choices that turned to be very limiting. So, around November 1999, I expressed the intention to work on the next generation (what people started calling Cocoon2 or simply C2) to solve all those architectural issues.

Cocoon 2.0

It took two years and three different project leaders to finish Cocoon 2.0 but we made it. It's an XML framework that raises the usage of XML and XSLT technologies for server applications to a new level. Designed for performance and scalability around pipelined SAX processing, Cocoon offers a flexible environment based on the separation of concerns between content, logic and style. A centralized configuration system and sophisticated caching enable you to create, deploy, and maintain rock-solid XML server applications.

Cocoon was designed as an abstract engine that could be connected to almost anything, but it ships with servlet and command line connectors. The servlet connector allows you to call Cocoon from your favorite servlet engine or application server. You can install it beside your existing servlets or JSPs. The command line interface allows you to generate static content as a batch process. It can be useful to pre-generate those parts of your site that are static, some of which may be easier to create by using Cocoon functionalities than directly (say, SVG rasterization or applying stylesheets). For example, the Cocoon documentation and web site are all generated by Cocoon from the command line.

Component Pipelines

Cocoon is now based on the concept of component pipelines. Like a UNIX pipe, but instead of passing bytes between STDIN and STDOUT, Cocoon passes SAX events.

The three types of pipeline components are generators, which take a request and produce SAX events; transformers, which consume SAX events and produce SAX events; and serializers, which consume SAX events and produce a response. A Cocoon pipeline is composed of one generator, zero or more transformers, and one serializer. As with UNIX pipes, a small number of components give you an incredible number of possible combinations. Think of active Lego bricks for XML manipulation.

Cocoon ships with a number of these components which were donated over the years by users and developers. If a component is general enough, we'll ship it with Cocoon. Some of Cocoon's generators include the following. FileGenerator acts as a parser, reading a file (or any other URL) and producing SAX events from it. DirectoryGenerator reads a directory listing, formats it as XML and produces SAX events. ServerPagesGenerator generates dynamic XML from XSP server pages. JSPGenerator is similar but parses the result of a JSP page. VelocityGenerator is also similar but uses Velocity as a template language.

Some of Cocoon's transformers include XSLTTransformer, which transforms a SAX stream depending on a given XSLT stylesheet. XIncludeTransformer augments the SAX stream by processing the xinclude namespace and including external sources into the stream. I18NTransformer transforms content based on a i18n dictionary and some language parameter.

Some of Cocoon's serializers include XMLSerializer, which streams the SAX events into XML. HTMLSerializer streams SAX events into browser-compatible HTML. TextSerializer streams only the textual SAX events, useful for non-XML languages like code or CSS or VRML. PDFSerializer produces a PDF stream out of XSL:FO SAX events, using Apache FOP. And SVG2JPGSerializer, which produces a JPG stream out of SVG SAX events, using Apache Batik.

The Cocoon Sitemap

We call the following XML fragment the "sitemap". It's the configuration document which tells Cocoon how to create the resources identified by various URIs. The sitemap is akin to the blueprints of a site, telling Cocoon how to assemble components into pipelines that produce both static and dynamic resources.


  <map:pipeline>

   <map:match pattern="hello.html">

    <map:generate src="docs/samples/hello-page.xml"/>

    <map:transform src="stylesheets/page/simple-page2html.xsl"/>

    <map:serialize type="html"/>

   </map:match>



   <map:match pattern="hello.wml">

    <map:generate src="docs/samples/hello-page.xml"/>

    <map:transform src="stylesheets/page/simple-page2wml.xsl"/>

    <map:serialize type="wap"/>

   </map:match>



   <map:match pattern="hello.vml">

    <map:generate src="docs/samples/hello-page.xml"/>

    <map:transform src="stylesheets/page/simple-page2vml.xsl"/>

    <map:serialize type="xml"/>

   </map:match>



   <map:match pattern="hello.svg">

    <map:generate src="docs/samples/hello-page.xml"/>

    <map:transform src="stylesheets/page/simple-page2svg.xsl"/>

    <map:serialize type="svg2jpeg"/>

   </map:match>



   <map:match pattern="hello.wrl">

    <map:generate src="docs/samples/hello-page.xml"/>

    <map:transform src="stylesheets/page/simple-page2vrml.xsl"/>

    <map:serialize type="vrml"/>

   </map:match>



   <map:match pattern="hello.pdf">

    <map:generate src="docs/samples/hello-page.xml"/>

    <map:transform src="stylesheets/page/simple-page2fo.xsl"/>

    <map:serialize type="fo2pdf"/>

   </map:match>

  </map:pipeline>

This fragment is the "hello world" equivalent for Cocoon, allowing Cocoon to say "hello world" to your browser using HTML, to your WAP phone using WML, to your voice client using VoiceXML, or by rendering an SVG into a JPG image, presenting a VRML world, or even producing the PDF for your printing needs. And all using the following XML document as input:


  <?xml version="1.0"?>

  <page>

   <title>Hello</title>

   <content>

    <para>This is my first Cocoon2 page!</para>

   </content>

  </page>

If only I'd had this back in 1998 for java.apache.org.

Now that you understand components it's easier to understand the following part of the sitemap:


 1)  <map:match pattern="hello.html">

 2)   <map:generate src="hello-page.xml"/>

 3)   <map:transform src="hello-stylesheet.xsl"/>

 4)   <map:serialize type="html"/>

     </map:match>

The first line "matches" the incoming request for the given URI "hello.html". The second generates using the default generator, FileGenerator in this case, and throws into the SAX pipeline the events that the parser emits from parsing the file docs/samples/hello-page.xml. Line three calls the default transformer, XSLTTransformer, with the given stylesheet. Finally, line 4 calls the HTMLSerializer.

In short, the server resource indentified by "hello.html" is produced by parsing the "hello-page.xml", apply an XSLT stylesheet, "hello-stylesheet.xsl", and serializing the results as HTML.

But what, I can imagine you saying, about verbosity? I don't want to do this for every URI I have to serve. Don't worry. Another example should suffice.


   <map:match pattern="sites/*.apache.org">

    <map:generate src="/sites/{1}_apache_org.xml"/>

    <map:transform src="/sites/{1}_apache_org-html.xsl"/>

    <map:serialize/>

   </map:match>

The "*" symbol is matched against a token. Then, this matched token replaces {1}. So when requesting "sites/xml.apache.org" we'll get the equivalent of


  <map:generate src="/sites/xml_apache_org.xml"/>

  <map:transform src="/sites/xml_apache_org-html.xsl"/>

  <map:serialize/>

Or when requesting "sites/jakarta.apache.org" we'll get the equivalent of


  <map:generate src="/sites/jakarta_apache_org.xml"/>

  <map:transform src="/sites/jakarta_apache_org-html.xsl"/>

  <map:serialize/>

Perhaps you were wondering about regular expressions? The entire sitemap is extensible since the implementation of the <map:match> behavior is pluggable, just like any other sitemap component. So this sitemap fragment


 <map:match type="regexp" pattern="^/xerces-(j|c|p)/(.*)$">

  <map:generate src="/xerces/{1}/{2}.xml"/>

  <map:transform src="styles/document2html.xsl"/>

  <map:serialize/>

 </map:match>

indicates that you should use the "regexp" matcher instead of the default one. If you request /xerces-j/index the file generator parses the file xerces/j/index.xml; if you request /xerces-p/installing you transform xerces/p/installing and so on.

But while matchers match at the beginning of the pipeline, there is another component, "selector", that is capable of selecting different components inside the pipeline. For example, this sitemap fragment


   <map:match pattern="images/logo">

    <map:generate src="./images/logo.svg"/>

    <map:select type="browser">

     <map:when test="accepts('image/png')">

      <map:serialize type="svg2png"/>

     </map:when>

     <map:otherwise>

      <map:serialize type="svg2jpg"/>

     </map:otherwise>

    </map:select>

   </map:match>

selects the serializer depending on browser capabilities.

Other Features

In addition to its pipeline architecture and extensible sitemap, Cocoon has some other interesting features and design properties. It is very extensible. Almost everything in Cocoon is written as a component. You can write your own components to work with existing ones and to reuse in different applications.

Because the sitemap contains special semantics that allow you to aggregate content from different sources into the same document and to associate different namespaces with these different sources, Cocoon is an ideal tool for content aggregation.

Cocoon allows you to add your own protocol handlers to connect to sources that can be retrieved via URI. This wraps around the java.net classes to avoid the limitation of a single URLHandlerFactory.

Cocoon is able to call itself via the cocoon:// protocol. For example, this sitemap fragment is the one used to generate the Cocoon documentation


   <map:match pattern="*.html">

    <map:aggregate element="site">

     <map:part src="cocoon:/book-{1}.xml"/>

     <map:part src="cocoon:/body-{1}.xml"/>

    </map:aggregate>

    <map:transform src="stylesheets/site2xhtml.xsl">

     <map:parameter name="use-request-parameters" value="true"/>

     <map:parameter name="header" value="graphics/{1}-header.jpg"/>

    </map:transform>

    <map:serialize/>

   </map:match>

It uses both content aggregation and recursive calls to aggregate a sidebar with the document body.

Another very important sitemap component is called "action", and it encapsulates headless logic which performs side-effects on the flow of information. Examples include the FormValidatorAction which validates the input of a form using a schema-like description, or the series of Database*Actions which deal with databases, or the series of Session*Actions which deal with client state persistance.

Cocoon also provides adaptive caching. Site resources, even those which are dynamically generated (since Cocoon components may be implemented to be cache-aware), can be cached inside Cocoon. This prevents wasting CPU cycles to regenerate resources which can be cached.

eXtensible Server Page (XSP) is a SAX-aware server page technology that extends the concept of JSP for XML content. XSPs are compiled into Cocoon Generators and generate SAX events directly, unlike JSPs which are compiled into servlets and require a subsequent parse stage. XSPs allow you to associate dynamic behavior with tags in a particular namespace and "logicsheets", which are XSLT stylesheets that transform tags into XSP logic and allow better separation beween code and content.

Conclusion

Cocoon is currently based on many other Apache projects -- Ant, Avalon, Xerces, Xalan, FOP, Batik, Velocity, Regexp -- but due to its high modularity, it has ful support for alternative implementations of underlying W3C technologies.

The Cocoon development community is one of the more active under the Apache Software Foundation: boasting more than 15 active developers, around 500 subscribers of the development mail list, and around 1100 on the user list. We consider Cocoon 2.0 stable in both implementation and API: this means that we consider it safe to be used for production environments. And it's already being used on many such projects.

You can find more information about Apache Cocoon at its home page, from which it may also be downloaded.