Excel Reports with Apache Cocoon and POI
Generating professional reports within a web application can be a difficult problem to solve. However, by combining two Apache projects (Cocoon and POI) you can produce Excel reports from a pure Java server application. The key to this solution is to embrace Excel on the client and deploy a Java solution on the server.
Excel is the business world's spreadsheet of choice. While most readers of this article are software technologists, many of the projects and solutions you're developing are meant for use by non-technologists in a corporate environment, in which Microsoft Office is the dominant software suite. Your end user is probably already familiar and skilled with Office. Providing reports as Excel spreadsheets allows the data to be manipulated to meet the end user's needs. As long as a sufficiently similar report and associated data are available, most end users can manipulate the report to obtain the desired results.
Although Excel is a Microsoft Windows application, its binary file format is well known and may be manipulated by many low cost solutions, including the pure Java Apache POI project. Rendering reports in Excel does not require that the web application runs Windows, only that the client runs Windows.
![]() |
| Figure 1: Basic Solution Design |
A report is defined in two parts: an Excel "Report Template" document and a solution specific "Report Scheme" XML document.
In addition to using Excel to render the report, this solution uses Excel as a client GUI with which to format the report. An Excel document supplies all the static text, desired data cell formatting, colorization, delineation lines, at-a-future-date formula manipulation, and all elements that are independent of the actual data. This strategy eliminates the need to create some other GUI. Since Excel is used for both the GUI and the rendering mechanism, the solution is feature rich and defect free.
TOMCAT Installation from cocoon-reports.war (Tested with Tomcat 4.1.12)
|
The Excel file does not meet all needs for a report generator. In particular, three other components are needed to produce useful practical reports:
A few other tidbits are tossed into the report schema file like title, description, and label.
This solution is based on the Apache Cocoon Framework. The main mechanism to create a report is a Cocoon pipeline, which processes the two report definition documents (template and scheme), accepts optional input parameters, and then serializes the report into a binary stream compatible with Excel. For those unfamiliar with Apache Cocoon, XML.com published an introductory tutorial last year.
This solution is driven primarily by HTTP request-response pair
communications typically delivering HTML content. However, if response
mime type is set to application/vnd.ms-excel, and the content
is recognizable by Excel, the MS Internet Explorer browser will show the
content using Excel. To the end user, it is a very smooth transition.
Let's look at a very simple example report: calculate the daily sales total over a specified time period for a specific staff member. Both this example and the next assume that some financial transaction data is available in a relational database. Source code for both examples is available as a zipped download.
The two examples presented in this article both assume a relational database containing a single table of financial transactions is available as shown in Figure 2 below. 1000 entries in this table are supplied in the downloadable software. Note for simplicity only a single table is used, and it is not fully normalized.
| Id | Staff | Type | Date | Amount |
| 1 | mel | Milk | 2003-01-19 | 9.99 |
| 2 | george | Hardware | 2003-01-19 | 0.10 |
| 3 | kathy | Produce | 2003-01-13 | 5.72 |
| 4 | mel | Cannedfood | 2003-01-23 | 1.21 |
| 5 | cynthia | Dairy | 2003-02-13 | 9.68 |
| 6 | george | Bread | 2003-01-08 | 3.71 |
| 7 | george | Bread | 2003-01-01 | 1.19 |
| 8 | jasson | Bread | 2003-02-20 | 1.06 |
| 9 | cynthia | Cannedfood | 2003-02-24 | 8.62 |
| 10 | jasson | Hardware | 2003-02-21 | 4.34 |
A Excel Report Template document, as show in figure 3, created by the report author, forms the base of the solution. All static text, cell formatting, colorization, etc. is supplied here by simply using Excel as a GUI. In this example, we will create a table of dates and associated dollar amounts.
|
| Figure 3: Simple Summary Example Report Template |
For this report, the user must supply three parameters: Start Date, End Date, and Staff Member Name. The associated section in the Report Scheme file defining these input parameters looks like this:
<parameters>
<input-select>
<title>Staff Person Name</title>
<label>staff-person</label>
<init>george</init>
<option value="george">George Fumpter</option>
<option value="mel">Mellannie Miller</option>
<option value="jasson">Jasson Joskophie</option>
<option value="cynthia">Cynthia Clark</option>
<option value="kathy">Kathy Kompton</option>
</input-select>
<input-date>
<title>Start Date</title>
<label>start-date</label>
<init>2003-01-01</init>
</input-date>
<input-date>
<title>End Date</title>
<label>end-date</label>
<init>2003-01-10</init>
</input-date>
<parameters>
The resulting HTML page rendered by the Cocoon input parameter pipeline will appear to the user as show in Figure 4 below.
![]() |
| Figure 4: Simple Example Input Parameter Page |
The second section of the Report Scheme document supplies the actual SQL text. In this case, as with many reports, input parameters must be supplied. This solution provides several tags to achieve this. The two tags shown here are <rtp:get-select> and <rpt:get-date>. These tags will be replaced with the selected values before the document is sent to the SQLTransformer for processing.
<sql:execute-query>
<sql:query>
SELECT SUM(amount) as sum, date
FROM transactions
WHERE
staff = '<rpt:get-select>staff-person</rpt:get-select>'
AND transactions.date BETWEEN
'<rpt:get-date>start-date</rpt:get-date>'
AND
'<rpt:get-date>end-date</rpt:get-date>'
GROUP BY date
</sql:query>
</sql:execute-query>
|
The result of the query will be a table of column named "date" and "sum". The third main section of the report directs the generator to insert the queried data into the Excel report. There are several commands to do this; <rpt:column> is the most commonly used. In this particular example, it is interpreted as insert all "date" values beginning at Excel Cell Address B6 going downward. And insert this data as a "calendar date" type as opposed to a string (default), which makes more interesting Excel processing possible.
<column> <qry-name>date</qry-name> <adrs-column>B</adrs-column> <adrs-row>6</adrs-row> <type>java.util.Date</type> </column>
The last stage of the Cocoon pipeline inserts the data into the supplied report template and then renders the final report in the users browser. In this case, the final report appears as
![]() |
| Figure 5: Simple Example Final Report |
Cocoon facilitates XML-directed software. The entire solution is achieved by directing generic components with the contents of XML files. Figures 6 and 7 below demonstrate how the main pipelines are composed of reusable, generic XML processing components, which are in turn directed by XML documents.
Apache Cocoon NotationThis article uses the following Apache Cocoon Graphical Notation:
|
The report generator parameter pipeline is responsible for rendering the report input parameter page. Such parameters are typically date, staff member name, company name, and so on -- whatever makes sense as a generic feature of the report technology. The parameter definitions are obtained from the report scheme document.
![]() |
| Figure 6: Input Parameter Cocoon Pipeline |
The report generator rendering pipeline (shown in Figure 7) is the realheart of the solution.
![]() |
| Figure 7: Report Rendering Cocoon Pipeline |
The pipeline processing steps are as follows:
This is a clear example of how significant and productive solutions can be achieved using XML-directed architectures. The exception is the last component in the pipeline, the report serializer. It's a component I've written to support the examples in this article, is free to download and use by all, and will likely be integrated or assimilated somehow into the POI-related components now being incorporated into Cocoon.
The first example was rather trivial. Suppose your customer wanted a two-dimensional report depicting the sales amount all staff members against calendar dates as show in figure 8 below:
![]() |
| Figure 8: A Two Dimensional Report |
Due to lack of space I will not describe the solution in detail, rather a working Apache Cocoon solution can be download here. However, some interesting characteristics of this solution are worth noting:
XML Direct SoftwareApache Cocoon and the examples presented here clearly exhibit characteristic of XML-directed software. Cocoon is an XML-based framework for aggregating and assembling SAX-based components into a web application. The components are generic in nature without any hardcoded domain-specific content. Domain-specific content -- for example, business and presentation logic -- resides in "directing" XML files and sources. |
If you are starting from scratch, Apache Cocoon can be integrated and configured to take responsibility for a URI subtree or the entire URI space by way of various deployment and configuration files available with any standard application server. See the Cocoon home page for integration instructions with all major application servers.
The best place to start is by downloading the report solution and examples; then setup a test instance based on the example from this article.
The intent is to supply the necessary collection of critical components to achieve professional web-based Excel report solutions. Variations could include querying an XML database instead of a relational database, obtaining the report data over the network by way of SOAP messaging, querying a database for input parameter selections, etc. Such variations require modification to the various Cocoon pipelines, but this is the strength and flexibility of the platform. Do the provided and standard Cocoon components have sufficient functionality to handle your situation? Probably. If in doubt, write me, and I'll respond.
While the solution described herein is built for and around Excel, this approach can be applied to any spreadsheet application whose file format can be manipulated. Indeed, the next release of Excel allows for a more readable XML document format as opposed to the current binary standard.
In this article we have described a simple, low-cost, modular, XML component oriented solution for generating and rendering real-time Excel reports. Future enhancements will probably include support for formulas, multiple sheets, and so on. Portions of this solution were developed for Amansi.com, which has donated this solution to open-source.
Resources
|
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.