Menu

Excel Reports with Apache Cocoon and POI

January 22, 2003

Steve Punte

Introduction

Generating professional reports within a web application can be a difficult problem to solve. However, by combining two Apache projects (Cocoon and POI) you can produce Excel reports from a pure Java server application. The key to this solution is to embrace Excel on the client and deploy a Java solution on the server.

Why Use Microsoft Excel?

Excel is the business world's spreadsheet of choice. While most readers of this article are software technologists, many of the projects and solutions you're developing are meant for use by non-technologists in a corporate environment, in which Microsoft Office is the dominant software suite. Your end user is probably already familiar and skilled with Office. Providing reports as Excel spreadsheets allows the data to be manipulated to meet the end user's needs. As long as a sufficiently similar report and associated data are available, most end users can manipulate the report to obtain the desired results.

Although Excel is a Microsoft Windows application, its binary file format is well known and may be manipulated by many low cost solutions, including the pure Java Apache POI project. Rendering reports in Excel does not require that the web application runs Windows, only that the client runs Windows.

The Solution in a Nutshell

Basic Solution Design
Figure 1: Basic Solution Design

A report is defined in two parts: an Excel "Report Template" document and a solution specific "Report Scheme" XML document.

Excel "Report Template"

In addition to using Excel to render the report, this solution uses Excel as a client GUI with which to format the report. An Excel document supplies all the static text, desired data cell formatting, colorization, delineation lines, at-a-future-date formula manipulation, and all elements that are independent of the actual data. This strategy eliminates the need to create some other GUI. Since Excel is used for both the GUI and the rendering mechanism, the solution is feature rich and defect free.

TOMCAT Installation from cocoon-reports.war (Tested with Tomcat 4.1.12)

  1. Download war file from http://candlelightsoftware.com/downloads/cocoon-reports.war
  2. Place cocoon-reports in ~jakarta-tomcat-4.1.12/webapps/cocoon-reports.war
  3. Start up Tomcat
  4. Place browser URL to http://localhost:8080/cocoon-reports You should now see selection for the two demo report. This will also un-archive the war file so the additional configuration and setup steps below can be performed.
  5. Create MySql database table: mysql < ~/cocoon-reports/demo.sql on command line.
  6. Adjust Cocoon Sitemap file:
    1. Adjust URL for database if needed: see line 230 of sitemap file
    2. Adjust line 237 of sitemap file (i.e. parameter absolute-path) to reflect your installation.

XML "Report Scheme" File

The Excel file does not meet all needs for a report generator. In particular, three other components are needed to produce useful practical reports:

  1. The primary database query statement to retrieve the data. In this article's examples, standard SQL query statements are executed against the MySQL database. However, with slight modification of the solution, XQuery statements could be used against an XML database or SOAP web services over the net.
  2. Placement instructions of the resulting query data into the Excel spreadsheet.
  3. Many practical reports need input parameters. For example, it makes little sense to write unique report for each and every month. Instead, a more practical solution is for the report to accept a parameter to specify which month it should process.

A few other tidbits are tossed into the report schema file like title, description, and label.

Generation Solution

This solution is based on the Apache Cocoon Framework. The main mechanism to create a report is a Cocoon pipeline, which processes the two report definition documents (template and scheme), accepts optional input parameters, and then serializes the report into a binary stream compatible with Excel. For those unfamiliar with Apache Cocoon, XML.com published an introductory tutorial last year.

Rendering Machinery

This solution is driven primarily by HTTP request-response pair communications typically delivering HTML content. However, if response mime type is set to application/vnd.ms-excel, and the content is recognizable by Excel, the MS Internet Explorer browser will show the content using Excel. To the end user, it is a very smooth transition.

A Simple Example

Let's look at a very simple example report: calculate the daily sales total over a specified time period for a specific staff member. Both this example and the next assume that some financial transaction data is available in a relational database. Source code for both examples is available as a zipped download.

The Data

The two examples presented in this article both assume a relational database containing a single table of financial transactions is available as shown in Figure 2 below. 1000 entries in this table are supplied in the downloadable software. Note for simplicity only a single table is used, and it is not fully normalized.

Figure 2: Example Data

Id Staff Type Date Amount
1 mel Milk 2003-01-19 9.99
2 george Hardware 2003-01-19 0.10
3 kathy Produce 2003-01-13 5.72
4 mel Cannedfood 2003-01-23 1.21
5 cynthia Dairy 2003-02-13 9.68
6 george Bread 2003-01-08 3.71
7 george Bread 2003-01-01 1.19
8 jasson Bread 2003-02-20 1.06
9 cynthia Cannedfood 2003-02-24 8.62
10 jasson Hardware 2003-02-21 4.34

The Report Template

A Excel Report Template document, as show in figure 3, created by the report author, forms the base of the solution. All static text, cell formatting, colorization, etc. is supplied here by simply using Excel as a GUI. In this example, we will create a table of dates and associated dollar amounts.

Simple Example Report Template in Excel
Figure 3: Simple Summary Example Report Template

Input Parameters

For this report, the user must supply three parameters: Start Date, End Date, and Staff Member Name. The associated section in the Report Scheme file defining these input parameters looks like this:

<parameters>

  <input-select>

    <title>Staff Person Name</title>

    <label>staff-person</label>

    <init>george</init>

    <option value="george">George Fumpter</option>

    <option value="mel">Mellannie Miller</option>

    <option value="jasson">Jasson Joskophie</option>

    <option value="cynthia">Cynthia Clark</option>

    <option value="kathy">Kathy Kompton</option>

  </input-select>

  <input-date>

    <title>Start Date</title>

    <label>start-date</label>

    <init>2003-01-01</init>

  </input-date>

  <input-date>

    <title>End Date</title>

    <label>end-date</label>

    <init>2003-01-10</init>

  </input-date>

<parameters>

The resulting HTML page rendered by the Cocoon input parameter pipeline will appear to the user as show in Figure 4 below.

Simple Example HTML Input Parameter Page
Figure 4: Simple Example Input Parameter Page

Primary Query

The second section of the Report Scheme document supplies the actual SQL text. In this case, as with many reports, input parameters must be supplied. This solution provides several tags to achieve this. The two tags shown here are <rtp:get-select> and <rpt:get-date>. These tags will be replaced with the selected values before the document is sent to the SQLTransformer for processing.

<sql:execute-query>

  <sql:query>

    SELECT SUM(amount) as sum, date

      FROM transactions

      WHERE

        staff = '<rpt:get-select>staff-person</rpt:get-select>'

        AND transactions.date BETWEEN

            '<rpt:get-date>start-date</rpt:get-date>'

          AND

            '<rpt:get-date>end-date</rpt:get-date>'

      GROUP BY date

 </sql:query>

</sql:execute-query>

Formatting Information

The result of the query will be a table of column named "date" and "sum". The third main section of the report directs the generator to insert the queried data into the Excel report. There are several commands to do this; <rpt:column> is the most commonly used. In this particular example, it is interpreted as insert all "date" values beginning at Excel Cell Address B6 going downward. And insert this data as a "calendar date" type as opposed to a string (default), which makes more interesting Excel processing possible.

<column>

  <qry-name>date</qry-name>

  <adrs-column>B</adrs-column>

  <adrs-row>6</adrs-row>

  <type>java.util.Date</type>

</column>

The Final Result

The last stage of the Cocoon pipeline inserts the data into the supplied report template and then renders the final report in the users browser. In this case, the final report appears as

Final Simple Example Report in Excel
Figure 5: Simple Example Final Report

Solution Implementation Details

Cocoon facilitates XML-directed software. The entire solution is achieved by directing generic components with the contents of XML files. Figures 6 and 7 below demonstrate how the main pipelines are composed of reusable, generic XML processing components, which are in turn directed by XML documents.

Apache Cocoon Notation

This article uses the following Apache Cocoon Graphical Notation:

Cocoon Graphical Notation

The Parameter Pipeline

The report generator parameter pipeline is responsible for rendering the report input parameter page. Such parameters are typically date, staff member name, company name, and so on -- whatever makes sense as a generic feature of the report technology. The parameter definitions are obtained from the report scheme document.

Input Parameter Cocoon Pipeline
Figure 6: Input Parameter Cocoon Pipeline

The Rendering Pipeline

The report generator rendering pipeline (shown in Figure 7) is the realheart of the solution.

Report Rendering Cocoon Pipeline
Figure 7: Report Rendering Cocoon Pipeline

The pipeline processing steps are as follows:

  1. Incoming HTTP request parameters are converted into an XML document by the request generator so as to be compatible with later components.
  2. An XSL transformer, directed by render.xsl, generates XInclude statements so that key elements of the report scheme document can be incorporated.
  3. The XInclude statements are executed retrieving the appropriate report scheme query and formatting constructs.
  4. The original HTTP request parameters are properly merged into the database query statements using a second XSL transformer.
  5. The database query statements are executed using the standard Cocoon SQL transformer, retrieving the desired report data.
  6. The retrieved data is arranged according to the report scheme data-block formatting statements for appropriate insertion into the Excel spreadsheet. This is accomplished by the use of a third XSL transformer.
  7. The retrieved data is merged with the Excel template and the final binary Excel report is generated and sent back to the user using the report serializer.

This is a clear example of how significant and productive solutions can be achieved using XML-directed architectures. The exception is the last component in the pipeline, the report serializer. It's a component I've written to support the examples in this article, is free to download and use by all, and will likely be integrated or assimilated somehow into the POI-related components now being incorporated into Cocoon.

A More Complex Example

The first example was rather trivial. Suppose your customer wanted a two-dimensional report depicting the sales amount all staff members against calendar dates as show in figure 8 below:

Final Two Dimentional Report in Excel
Figure 8: A Two Dimensional Report

Due to lack of space I will not describe the solution in detail, rather a working Apache Cocoon solution can be download here. However, some interesting characteristics of this solution are worth noting:

  • The data for this report is obtained in two separate queries. The first retrieves staff member names and their associated dollar transactions. This query is used to populate the two horizontal bluish rows. The second query retrieves calendar dates, the total transaction on each date, and the main table of data. This data populates the columns and central table area. A report can contain any number of data queries and column/row/matrix insertions.
  • The Excel Report Template is created with all columns, row, and matrix areas of one element size with appropriate color and cell format. Additional report serializer tag commands copy and shift this formatting downward or to the right to achieve the appropriate size.

XML Direct Software

Apache Cocoon and the examples presented here clearly exhibit characteristic of XML-directed software. Cocoon is an XML-based framework for aggregating and assembling SAX-based components into a web application. The components are generic in nature without any hardcoded domain-specific content. Domain-specific content -- for example, business and presentation logic -- resides in "directing" XML files and sources.

How to Incorporate this Solution

If you are starting from scratch, Apache Cocoon can be integrated and configured to take responsibility for a URI subtree or the entire URI space by way of various deployment and configuration files available with any standard application server. See the Cocoon home page for integration instructions with all major application servers.

The best place to start is by downloading the report solution and examples; then setup a test instance based on the example from this article.

The intent is to supply the necessary collection of critical components to achieve professional web-based Excel report solutions. Variations could include querying an XML database instead of a relational database, obtaining the report data over the network by way of SOAP messaging, querying a database for input parameter selections, etc. Such variations require modification to the various Cocoon pipelines, but this is the strength and flexibility of the platform. Do the provided and standard Cocoon components have sufficient functionality to handle your situation? Probably. If in doubt, write me, and I'll respond.

While the solution described herein is built for and around Excel, this approach can be applied to any spreadsheet application whose file format can be manipulated. Indeed, the next release of Excel allows for a more readable XML document format as opposed to the current binary standard.

Summary

In this article we have described a simple, low-cost, modular, XML component oriented solution for generating and rendering real-time Excel reports. Future enhancements will probably include support for formulas, multiple sheets, and so on. Portions of this solution were developed for Amansi.com, which has donated this solution to open-source.

Resources