Menu

AxKit: XML Web Publishing with Apache and mod_perl

May 24, 2000

Matt Sergeant



Introduction

Table of Contents

Introduction

Overview

How AxKit Works

Mapping XML Files to Style Sheets

Choosing a Style Sheet

Cascading Style Sheets

Setting up AxKit

Conclusion

One of XML's major benefits to web developers is that it is a standard way to separate data from presentation, and create a consistent templating system for a web site. Yet that promise is yet to be fully realized by many, due to the immature state of XML tool support, especially in authoring.

An important part of using XML for web publishing is content delivery. Although XML-to-HTML conversion is partially possible in browsers such as Internet Explorer, the reality is that HTML or XHTML will be served from web servers for a long time to come. This means server-side XML transformation is the most viable option for publishing with XML today.

Server side transformations can be handled at various levels. The most basic of these is static transformations (e.g., using an XSLT processor and some shell scripts), but this method can quickly become awkward, and is not satisfactory for dynamic web sites.

Another option is application server environments such as Zope or Enhydra. If you have a real need to use these products, they are a good choice. But keep in mind that they have a tendency to operate within their own enclosed universe.

A third choice is to use an XML content delivery infrastructure such as that provided by the Apache Cocoon project. Cocoon is a Java-based environment for pipelined transformation of XML resulting in web pages served to the user. It also offers more advanced features for active server pages etc.

AxKit, a mod_perl and Apache-based XML content delivery solution, takes an approach similar to Cocoon. It provides simple ways for web developers to deliver XML utilizing multiple processing stages and style sheets, all programmable through Perl. AxKit takes care of caching so that the developer doesn't have to worry about it. It's also tightly bound to the Apache web server, providing a good route forward for those with an existing investment in mod_perl and Apache.

The fundamental way in which XML is delivered to a client in AxKit is through transformation with one or more style sheets. AxKit does not see style sheets solely in terms of XSLT transformations, but as more generic processing stages allowing arbitrary languages and operations.

In this article, I will describe AxKit's architecture, and give details of its installation and future development. Some familiarity with transforming XML would be helpful in reading this article.

Overview

AxKit is based on a plugin architecture. This allows the developer to quickly design modules based on currently available technology to create

  • new style sheet (transformation) language;
  • new methods for delivering alternate style sheets
  • new methods for determining media types

Because AxKit is built in Perl, these plugins are simple to develop. Not long after releasing AxKit, a developer wrote a suffix-based style sheet chooser module (which returns different style sheets if the user requests file.xml.html or file.xml.text) in just 15 lines of code!

The plugin architecture also makes developing new style sheet modules easy, using some of the readily available code in Perl's excellent CPAN (the Comprehensive Perl Archive Network). A style sheet module to deliver XML-News files as HTML would only take a few lines of code based on David Megginson's XMLNews::HTMLTemplate module, and AxKit works out all the nuances of caching for you.

AxKit comes with a number of pre-built style sheet modules, including two XSLT modules: one built around Perl's XML::XSLT module, a DOM based XSLT implementation that is in the beginning stages, and one built around Ginger Alliance Ltd's Sablotron XSLT library, which is a much more complete and fast XSLT implementation built in C++.

For the closet XSLT haters out there, there's XPathScript -- a language of my invention that takes some of the good features of XSLT, such as node matching and finding using XPath, and combines them with the power of ASP-like code integration and inline Perl scripting. XPathScript also compiles your style sheet into native Perl code whenever it changes, so execution times are very good for XML style sheet processing.

The core of AxKit delivers good performance. Serving cached results, it runs at about 80% of the speed of Apache. It achieves this primarily because it's built in mod_perl. The tight coupling with Apache that mod_perl provides means that a lot of the code is running in compiled C. In order to deliver cached results, AxKit just tells Apache where to find the cached file, and that it doesn't want to handle it. Apache then serves the page with its usual efficiency.

Finally, AxKit works hand-in-hand with Apache. So any webmaster skills you might have in Apache administration won't go to waste. AxKit integrates directly with Apache's <Files>, <Location> and <Directory> directives. All AxKit's configuration takes this approach, so you won't have to teach a webmaster new tricks to build up your XML site.

How AxKit Works

Table of Contents

Introduction

Overview

How AxKit Works

Mapping XML Files to Style Sheets

Choosing a Style Sheet

Cascading Style Sheets

Setting up AxKit

Conclusion

AxKit registers two "handlers" with Apache in order to do its work. In Apache terms, these are modules that work in various parts of the request phase (which covers things like Authentication, Type checking, Response, and Logging). When a request for a file comes in, AxKit does some quick checking to see if the file is XML. The main checks performed are to see if the file extension is .xml, and/or to check the first few characters of the file for the <?xml?> declaration. If the file is not XML, AxKit lets Apache deliver the file as it would normally. Note that it's possible to only apply AxKit to certain parts of your web site.

When an XML file is detected, the next step is to call plugin modules to determine the media type and/or style sheet preference. Media type chooser plugins normally look at the User-Agent header, or possibly at the Accept header. However, it's possible to define any method at all to determine the media type.

The existing style sheet choosers are based on examining the path info (this is a path following the filename, so you could request myfile.xml/mystyle), query string (for example myfile.xml?style=mystyle), and file suffix (myfile.xml.mystyle).

The final part is plumbing together all the style sheets with the XML file in the right order, implementing cascading where appropriate, and also "doing the right thing" with regards to the cache. AxKit invalidates the page cache when external entities (parsed or unparsed) change, as well as when the original document is altered. This allows modular style sheets to change only part of their make-up and ensure that changes to these sub-components cause a re-build of the cache.

Mapping XML Files to Style Sheets

AxKit uses two separate methods for mapping XML files to style sheets. The primary method is that specified in the W3C Recommendation at http://www.w3.org/TR/xml-stylesheet. This specifies that an <?xml-stylesheet?> processing instruction at the beginning of the XML file (after the <?xml?> declaration, and before the first element) defines the location and type of the style sheet.

The second method of mapping XML files to style sheets is used when no usable <?xml-stylesheet?> directives are found in the XML file. This uses a DefaultStyleMap option in your Apache configuration files. These directives can be used anywhere within Apache's <Files>, <Location>, and <Directory> sections, and .htaccess configuration system. In this way it's possible to define complex mapping rules for different file types and locations in whichever manner pleases you, without having each XML file individually specify its style sheet.

AxKit then uses the type of the style sheet (in the type="..." pseudo-attribute of the <?xml-stylesheet?> processing instruction, or the first parameter of the DefaultStyleMap option) to decide which module to use to process the file. Types are mapped to a module using another Apache configuration option: AddStyleMap. Again, this directive can appear anywhere within Apache's configuration structure. This allows you to try different modules for processing the same file.

Choosing a Style Sheet

Often, AxKit will have more than one style sheet option for serving a particular file. How does it choose which one to use?

The choice is made based on media type and on "style sheet preference." For a style sheet to be chosen for the file currently being served, the media types must match, or the style sheet must have a type of "all."

Style sheet preferences are slightly more complex. AxKit has three concepts of style sheets: persistent, preferred, or alternate. Without drowning in detail here, this facility allows a processor further up the "pipeline" to determine a style sheet -- so, for instance, a user could personalize their look and feel by determining which style sheet was applied.

Cascading Style Sheets

It's easy to get confused by the term "style sheet" here--in AxKit they are not restricted to XSLT sheets, and are best thought of as general processing and transformation stages. Style sheets in AxKit's terms can do anything, provided you can build a Language module to parse it. This includes the function of creating original XML content, as well as transforming and formatting it. So it becomes possible to, for instance, retrieve database results, add tags, and format the result into WML or HTML.

Cascading refers to the case of one style sheet's results "cascading" into the next (alternatively, you can think of this as a pipeline of processing stages). With AxKit there are a number of ways to achieve this. The first and simplest method is to have all your style sheets based on DOM, and produce DOM trees. When all the style sheets have finished processing, AxKit takes care to dispose of your DOM tree and output the result to the user agent.

The second method of cascading is to simply pass around the textual result of your output. This is necessary with modules like Sablotron where there is no DOM tree available. Modules further down the processing stream can parse this result as XML, and continue processing.

The final, and possibly most interesting, method of cascading processing stages is to use "end-to-end SAX." Here, AxKit sets up a chain of SAX handlers to process the document. Each style sheet stage based on SAX simply sends on SAX events to the next SAX handler up the chain. The final SAX handler in the chain simply outputs its results as text to the browser. The key advantage of this end-to-end system is that it starts outputting data to the browser as soon as parsing begins.

This system allows database modules to avoid building DOM trees in memory, which can be very resource intensive, but to simply fire SAX events, and the output from the database will appear as results are available.

Setting up AxKit

Now that we've been through the theory of how AxKit works, it's time to install and start using it. I don't believe in tools like this being hard to use or set up, so provided you can use an editor and modify a few Apache configuration files, installation should be simple.

Obviously, AxKit requires an installed Apache web server. AxKit also requires mod_perl, so if you don't have it already installed, you need to add it into your Apache. Depending on your platform, this can be complex. More information is available at http://perl.apache.org/guide/.

To install AxKit, first download the distribution. Extract the archive and change to the directory it creates. Then type:

perl Makefile.PL

make

make test

make install

(If you don't have 'apxs' in your path, mod_perl versions below 1.24 will produce a warning at the first step. This warning can be ignored.)

Once AxKit is installed, you will need to edit your Apache web server's configuration files. First you need to enable AxKit so that Apache understands AxKit's configuration directives, so add the following line to your httpd.conf file:

PerlModule AxKit

Finally, you can add in the core components of AxKit--the XMLFinder and the StyleFinder. These can be added to any .htaccess file, or other Apache configuration file:


PerlTypeHandler Apache::AxKit::XMLFinder

PerlHandler Apache::AxKit::StyleFinder

AxAddStyleMap text/xsl Apache::AxKit::Language::XSLT

The last line associates the type text/xsl with its style sheet code module.

Now you're ready to start serving up XML files. To get started, try looking at the example files in the AxKit distribution.

Conclusion

Resources

  • AxKit Home Page

  • AxKit Mailing List

  • AxKit Examples

  • AxKit XPathScript

  • Apache

  • mod_perl

AxKit provides web developers with the tools they need to deliver complex XML-based systems quickly, and eases them into the development process. It provides the power to develop their own system for style sheet negotiation, and also the flexibility to design completely new style sheet languages.

Although AxKit is not finished yet, the majority of the features described above are built and working reliably. The most significant things missing from AxKit are SAX-based style sheet languages (which need to be designed and built--I have a number of ideas for these) and alternate ways to generate the initial XML file (as opposed to filesystem XML). These will be coming in future releases.

As AxKit is an open source project, I hope people will jump in and help. We have the beginnings of an active mailing list, where you can vote on features, help develop them, or simply lurk. We're moving extremely quickly with the features. Developing in Perl allows us to do this, while still maintaining readable code (something I deem very important -- so don't assume because it's written in Perl that it's going to be a ball of spaghetti!). If there's something you'd like to see in AxKit, please join the mailing list and participate in the project.