Running Multiple XSLT Engines with Ant
Ant is a build
utility produced as part of the
Apache Jakarta project. It's
broadly equivalent to Unix's make or nmake
under Windows. make-like tools work by comparing the
date of an output file to the date of the input files required to
build it. If any of the input files is newer than the output file, the
output file needs to be rebuilt. This is a simple rule, and one that
generally produces the right results.
Unlike traditional make utilities, Ant is written in
Java, so Ant is a good cross-platform solution for controlling
automatic file building. That is good news for anyone developing
cross-platform XSLT scripts because you only need to target one build
environment. Anyone who has tried writing and maintaining equivalent
Windows and Unix batch scripts knows how hard it is to get the same
behavior across different platforms.
So why would you use Ant and XSLT together? If all you are doing is applying a single XSLT stylesheet to a single XML input file, using a single XSLT engine, then there is probably nothing to be gained. However, if
you need to apply one or more XSLT scripts to one or more XML input files in some sequence, in order to build your final output file(s);
you need to run multiple XSLT engines on the same XML input file(s) as part of your regression or integration testing
then Ant is a good and quick way to implement the workflow you need to transform your input(s) into your output(s).
Using Ant for a simple "1 input, 1 stylesheet, 1 output"
transformation is overkill but also a good way to learn how to use
Ant. Assume that the input is input.xml, the stylesheet
is transform.xsl, and the output is
output.html. A matching Ant 1.5 project file
build.xml might look something like
<project default="do-it">
<target name="do-it">
<xslt
processor="trax"in="input.xml"
style="transform.xsl"out="output.html"/>
</target>
</project>
The root element of an Ant build file is project. It
can contain a number of target elements. Its
default attribute contains the name of the target to
build if no targets are given on the command line. Since the example
project file defaults to building the target do-it, the
output file could be built equally using any of the following command
lines:
$ ant
$ ant do-it
$ ant -buildfile build.xml
$ ant -buildfile build.xml do-it
Unlike Unix's make and its clones, which can use
filenames for targets, Ant only uses target names defined in the build
file. So every target must have a unique name. Within a
target, any number of tasks can be performed. The xslt
task is included with Ant 1.5. With the processor
attribute set to trax, the xslt task uses
the default
JAXP/TraX
XSLT engine to perform the transformation.
What about a more complicated XSLT workflow, in which there are
three input files (in1.xml, in2.xml and
in3.xml)? Each of these has the same kind of
information, but the formats are different. So they are normalized to
a common format by three separate stylesheets (norm1.xsl,
norm2.xsl and norm3.xsl respectively). A
standard merging stylesheet exists, merge.xsl, but it
only merges two inputs (the usual input plus a filename passed as a
parameter to the stylesheet). So it has to be used twice in order to
merge the three normalized files. The merged sum of the three is
sorted to produce the final output file, out.xml.
The following Ant build file does the trick:
<project default="sort">
<target name="normalize">
<xslt
processor="trax"in="in1.xml"style="norm1.xsl"out="nm1.xml"/>
<xslt
processor="trax"in="in2.xml"style="norm2.xsl"out="nm2.xml"/>
<xslt
processor="trax"in="in3.xml"style="norm3.xsl"out="nm3.xml"/>
</target>
<target name="check12">
<uptodate
property="skip.merge12"targetfile="m12.xml">
<srcfiles dir=".">
<include name="nm1.xml"/>
<include name="nm2.xml"/>
<include name="merge.xsl"/>
</srcfiles>
</uptodate>
</target>
<target
name="merge12"depends="normalize,check12"unless="skip.merge12">
<xslt
processor="trax"in="nm1.xml"style="merge.xsl"out="m12.xml"force="true">
<param
name="source2"expression="nm2.xml"/>
</xslt>
</target>
<target name="check123">
<uptodate
property="skip.merge123"targetfile="123.xml">
<srcfiles dir=".">
<include name="m12.xml"/>
<include name="nm3.xml"/>
<include name="merge.xsl"/>
</srcfiles>
</uptodate>
</target>
<target
name="merge123"depends="normalize,merge12,check123"unless="skip.merge123">
<xslt
processor="trax"in="m12.xml"style="merge.xsl"out="123.xml"force="true">
<param
name="source2"expression="nm3.xml"/>
</xslt>
</target>
<target
name="sort"depends="merge123">
<xslt
processor="trax"in="123.xml"style="sort.xsl"out="out.xml"/>
</target>
<target name="clean">
<delete>
<fileset dir=".">
<include name="output.html"/>
<include name="nm*.xml"/>
<include name="m12.xml"/>
<include name="123.xml"/>
<include name="out.xml"/>
</fileset>
</delete>
</target>
</project>
Ant takes account of timestamps on files, just like
make. It will not run the transformation unless either
the input file or the stylesheet is newer than the output file (which
usually means that the input file or the stylesheet has been modified
since the last build). So if in1.xml is modified,
nm2.xml and nm3.xml will not be rebuilt.
Alternatively, if in3.xml is modified,
m12.xml will not be rebuilt. This can save a lot of
development time is situations where one of the transformations takes
much longer than the others.
Some things to note about this Ant project file include:
The default target is sort. Sorting is the last thing
that needs to be done, so making sort the default target
means that the whole build process is carried out by default.
The normalize target is used to run the three
normalization stylesheets. Although you could use three separate
targets, there is no need, since each xslt task only runs
when its output (nm1.xml, nm2.xml, or
nm3.xml) needs to be rebuilt.
The merge.xsl stylesheet is special because one of its
input filenames is passed as a stylesheet parameter. There is no way
that the standard xslt can know this, so it is necessary
to tell Ant explicitly when a rebuild is or is not required. The
check12 target uses Ant's uptodate task to
check whether m12.xml is newer than nm1.xml,
nm2.xml, and merge.xsl. The result is
stored in the Ant property skip.merge12.
The merge12 target is used to merge
nm1.xml and nm2.xml, but it only runs when
skip.merge12 is false; that is, when m12.xml
is not up to date. The xslt task which runs
merge.xsl has an extra attribute force,
which is set to true to override the default check for whether a
rebuild is necessary.
This complexity comes about purely because a filename has been
passed to the stylesheet as a parameter. It is a special case, but
one which is not too difficult to solve. The same logic applies to
the targets check123 and merge123.
Finally, the sort target, which is the default target
for the project, applies sort.xsl to 123.xml
to produce the end result, out.xml.
The files for this project are provided with the
zipped examples. You now know everything you
need to start using the standard Ant xslt task in your
own projects. However, you should also take the time to read the full
description of this task in the
Ant
documentation.
XSLT stylesheets can provide a good cross-platform solution for
manipulating XML, but different platforms use different XSLT engines.
Sites that are using the Apache Web server often use Apache Xalan.
Sites that are using PHP are likely to use Sablotron. Oracle sites
often use the Oracle XDK (as this may be the only XSLT engine that the
operations people will allow). Some XML consultants use and recommend
Saxon. Microsoft sites generally use MSXML. Although these XSLT
engines behave similarly, there are still some differences, so you
need to plan to test with all of the XSLT engines that are likely to
be used with your XSLT stylesheets. For this article, we will focus
on the Java XSLT engines, since they are the ones supported natively
by the Ant xslt task.
When testing with multiple engines, it's useful to be able to run
the same test using each XSLT engine from within one Ant build file.
But there's a problem: the JAXP/Trax interface uses the Java
javax.xml.transform.TransformerFactory property to define
which class should be instantiated as a factory for creating XSLT
engines. In order to use the XSLT engine of your choice, this
property needs to be set appropriately. However, there is no easy way
to do that within Ant and, hence, no easy way to change XSLT engines
within a single Ant build file. The best you can do is to launch a
separate Java process and then call Ant from within that new process.
To overcome this problem, the best solution is to create a new XSLT
task for Ant, one which makes it easy to select the desired XSLT
TransformerFactory.
mtxslt (short for "multi-XSLT")
is an Ant task that makes it easy to select several Java XSLT engines
within an Ant build file. mtxslt extends the standard
Ant xslt task while maintaining full compatibility with
it. Anything that works with the xslt task also works
with mtxslt.
With mtxslt, it is possible to ignore the value of the
Java javax.xml.transform.TransformerFactory property and
simply load a particular XSLT engine directly. mtxslt
currently supports
Xalan 2,
Saxon 6/7, and
Oracle XDK
9.
|
This example uses a few new Ant elements. A taskdef
is required to associate the task name mtxslt with the
Java class which implements it. Actually, you can call
mtxslt anything you want just by changing the name in the
taskdef.
The property definitions are used to define values
that can be retrieved by name throughout the build file, which is
similar to defining a string variable in a programming language.
Property definitions are used to define short names for qualified Java
class names and for file paths, since both of these tend to be long
and reduce the readability and maintainability of the build file if
repeated.
In this example, different XSLT engines are used to apply the same
stylesheet transform.xsl to the same input
input.xml. The resulting HTML files can then be
compared.
<project
name="test"default="all">
<taskdef
name="mtxslt"classname="org.xmLP.ant.taskdefs.xslt.XSLTProcess"/>
<property
name="trax"value="org.xmLP.ant.taskdefs.optional.TraXLiaison"/>
<property
name="xalan2"value="org.xmLP.ant.taskdefs.optional.Xalan2Liaison"/>
<property
name="xalan2.classpath"value="D:\home\tony\XSLT\xalan-j_2_4_0\bin\xalan.jar"/>
<property
name="saxon6"value="org.xmLP.ant.taskdefs.optional.Saxon6Liaison"/>
<property
name="saxon6.classpath"value="D:\home\tony\XSLT\Saxon-6.5.2\saxon.jar"/>
<property
name="saxon7"value="org.xmLP.ant.taskdefs.optional.Saxon7Liaison"/>
<property
name="saxon7.classpath"value="D:\home\tony\XSLT\Saxon-7.1\saxon7.jar"/>
<property
name="oracle9"value="org.xmLP.ant.taskdefs.optional.Oracle9Liaison"/>
<property
name="oracle9.classpath"
value="D:\home\tony\XSLT\xdk_java_9_2_0_3_0\lib\xmlparserv2.jar"/>
<target
name="all"depends="trax1,trax2,trax3,trax4,xalan2,saxon6,saxon7,oracle9"/>
<target name="trax1">
<xslt
processor="trax"in="input.xml"style="transform.xsl"out="trax1.html">
<param
name="target"expression="trax1"/>
</xslt>
</target>
<target name="trax2">
<mtxslt
processor="trax"in="input.xml"style="transform.xsl"out="trax2.html">
<param
name="target"expression="trax2"/>
</mtxslt>
</target>
<target name="trax3">
<xslt
processor="${trax}"in="input.xml"style="transform.xsl"out="trax3.html">
<param
name="target"expression="trax3"/>
</xslt>
</target>
<target name="trax4">
<mtxslt
processor="${trax}"in="input.xml"style="transform.xsl"out="trax4.html">
<param
name="target"expression="trax4"/>
</mtxslt>
</target>
<target name="xalan2">
<mtxslt
processor="${xalan2}"in="input.xml"style="transform.xsl"out="xalan2.html"
classpath="${xalan2.classpath}">
<param
name="target"expression="xalan2"/>
</mtxslt>
</target>
<target name="saxon6">
<mtxslt
processor="${saxon6}"in="input.xml"style="transform.xsl"out="saxon6.html"
classpath="${saxon6.classpath}">
<param
name="target"expression="saxon6"/>
</mtxslt>
</target>
<target name="saxon7">
<mtxslt
processor="${saxon7}"in="input.xml"style="transform.xsl"out="saxon7.html"
classpath="${saxon7.classpath}">
<param
name="target"expression="saxon7"/>
</mtxslt>
</target>
<target name="oracle9">
<mtxslt
processor="${oracle9}"in="input.xml"style="transform.xsl"out="oracle9.html"
classpath="${oracle9.classpath}">
<param
name="target"expression="oracle9"/>
</mtxslt>
</target>
<target name="clean">
<delete>
<fileset
dir="."includes="*.html"/>
</delete>
</target>
</project>
The target trax1 simply uses the standard
xslt task to transform the input file, as in the earlier
examples.
The target trax2 is identical to trax1,
except that it uses mtxslt instead of xslt.
This demonstrates that mtxslt implements the standard
behavior of the xslt task.
The target trax3 is similar to trax1,
except that the value of the processor attribute is the
value of the property trax (i.e.,
org.xmLP.ant.taskdefs.optional.TraXLiaison). This is a
feature of the xslt task that only becomes apparent when
you look at the Ant source code. The processor can
optionally be a qualified class name for an Ant XSLT liaison class.
This is the mechanism that mtxslt exploits to support
multiple XSLT engines.
This particular XSLT liaison class connects with the default
JAXP/TraX XSLT engine, so the result is identical to that produced by
the target trax1.
The target trax4 is identical to trax3,
except that it uses mtxslt instead of
xslt.
The targets xalan2, saxon6,
saxon7, and oracle9 use mtxslt
to call Xalan 2, Saxon 6, Saxon 7, and Oracle XDK 9 respectively.
Once the appropriate properties have been defined, mtxslt
attributes look nearly identical to standard xslt
attributes. Note, however, the addition of a classpath
attribute, which is required so that Ant loads the correct JAR archive
for each XSLT engine.
The target parameter that is passed to the stylesheet
allows the Ant target name to be embedded in each HTML product file to
make identification of the files easier. It serves no other
purpose.
That's all there is to it. You now not only know how to use Ant to
control XSLT, you also know how to use mtxslt to control
which XSLT engines are used within an Ant build. (All of the example
files from this article can be downloaded as a
ZIP archive.)
Ant is a powerful cross-platform tool for controlling build
processes and is ideal for controlling multifile builds involving XSLT
stylesheets. Using mtxslt, you can go further and invoke
multiple Java XSLT engines during a single build, which is ideal for
portability testing.
It may be worth mentioning that this article was written using an extended version of DocBook 4.2 and then converted to XHTML using an XSLT stylesheet -- a process controlled by an Ant build file. As well as building the article, Ant controlled the extraction of the Ant build file code out of the DocBook source and into the example build files, as well as the regression testing of the examples. It really works.
Ant;
Apache Jakarta project;
Ant: The Definitve Guide (O'Reilly, 2002)
JAXP/TraX API from JDK 1.4.1.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.