XML Pipelining with Ant
Ant is an extensible,
open-source build tool written in Java and sponsored by Apache's Jakarta project. Ant has developed
into something more than a just a build tool, however. It has gone beyond
its predecessor make (and make's kin) to become
a framework for performing an even larger variety of operations in a
single step, not just compiling code or cleaning up after a build.
Ant's build files are written in XML, and Ant takes advantage of XML in a variety of ways. In my opinion, Ant is a suitable if not ideal framework for XML pipelining -- that is, a framework for performing a variety of XML processing, in the desired order and in one fell swoop. The reason why I say ideal is because Ant is open, somewhat mature, reasonably stable, readily available, widely known and used, easily extensible, and already amenable to XML processing. What else could you ask for?
In this article, I'll discuss the XML structures in an Ant build file,
named build.xml by default, talk about some common
XML-related tasks that Ant can perform, and then finish up with an example
of XML pipelining.
I assume that you already know something of Ant and have probably used it. I plan to review the basics of the tool, but I also suggest that you read Tony Coates recent XML.com article ("Running Multiple XSLT Engines with Ant.") Along with an interesting approach to processing multiple XSLT stylesheets with multiple engines, Tony's article also provides good introductory material on Ant.
To get the examples in this piece to work, you'll of course need a recent version of Java on your system. You'll also need to download and install Ant version 1.5.1 (or later) binaries. Because you'll be using a new task that validates with RELAX NG schemas, you'll also need to download and install James Clark's Jing. All the example files discussed in this article are available for download in a ZIP archive and have been tested on the Windows XP Professional platform running Java 2 v1.4.
You can refer to Ant's HTML manual either online or,
after installing Ant locally, by bringing up
docs/manual/index.html in a browser.
One of the first things I noticed about Ant was that it didn't have an
explicit DTD available in the archives I downloaded, either the binary or
source
archive. I wanted to see Ant's DTD so I could figure out what went into a
build file. Then I discovered the antstructure task. This
task in essence extracts a DTD from Ant's source code.
The following snippet is a simple Ant build file that uses the
antstructure task (build-dtd.xml in the example
archive):
<?xml version="1.0"?>
<project default="dtd">
<target name="dtd">
<antstructure output="ant.dtd"/>
</target>
</project>
Here's a quick review of some basics. The document begins with an
optional XML declaration. The root element of an Ant build file is
<project>. It has several possible attributes, but
only one is required: default. This attribute names the
default target for the project, and in this case the only target,
dtd. A target represents a way to achieve an expected outcome
from an operation, such as a set of compiled Java classes or, in the case
of antstructure, a DTD.
The <target> element is a child of
<project> and must have a name
attribute. The value of this attribute matches the value of the
default attribute of <project>. When
there is more than one target in a build file, the value of
default only matches the value of one name
attribute in one <target>. The
<target> element also has several other attributes such
as depends (which will come to light in later examples).
The <antstructure> task element is empty. One of
four possible attributes is output which gives the name of
the output file that will contain the DTD that the task produces. This
output file is written to the current directory by default; however, if
you add a basedir attribute to <project>,
you can specify a different output directory than the current one as a
value of basedir, such as:
<project default="dtd" basedir="c:/temp">
Now give it a try. The following command presupposes that Ant's
bin directory is in the path environment variable, that your
working directory is C:\Java\Ant, and that you have unzipped
the example archive there:
C:\Java\Ant>ant -f build-dtd.xml
Ant assumes that the build file is named build.xml. If it
isn't, you need to use the -f option (or the synonyms
-file or -buildfile), followed by a
filename. You should see output from this command like this:
Buildfile: build-dtd.xml
dtd:
BUILD SUCCESSFUL
Total time: 2 seconds
The output lists the build filename, the target name dtd,
and whether the build was successful. The target produces the file
ant.dtd in your current directory. This DTD is
straightforward (only three parameter entities), but is quite long (nearly
4000 lines). With this DTD available now, you can see for yourself how a
build file is put together. For any element name in the DTD, you are
likely to find a corresponding entry in the Ant manual.
At first I wondered how Ant validates build files. The answer lies in
the source code, where it is clear that Ant validates build files in its
own application-specific, rather than in a general-purpose way. (If you
want to see how Ant does this, a good place to start looking is in the
Java source of the class
org.apache.tools.ant.helper.ProjectHelperImpl.) Ant is in
effect self-validating and avoids the use of namespaces.
Ant has a task for validating XML documents called
xmlvalidate. By default Ant validates with Xerces version
2.2.0. Consider the small XML document date.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE date SYSTEM "date.dtd">
<date>2003-01-31T00:00:01</date>
And its equally small DTD date.dtd:
<!ELEMENT date (#PCDATA)>
You can validate date.xml with the build file
build-valid.xml by using the xmlvalidate
task:
<?xml version="1.0"?>
<project default="valid">
<target name="valid">
<xmlvalidate file="date.xml"/>
</target>
</project>
The attribute file specifies the document to
validate. Issuing the command
C:\Java\Ant>ant -f build-valid.xml
produces the following output, if successful:
Buildfile: build-valid.xml
valid:
[xmlvalidate] 1 file(s) have been successfully validated.
BUILD SUCCESSFUL
Total time: 2 seconds
In Ant, types are elements that can help performs tasks, such as
on groups of files. Using the fileset type as a child of
xmlvalidate, you can validate a series of XML documents, as
shown in build-fileset.xml:
<?xml version="1.0"?>
<project default="valid">
<target name="valid">
<xmlvalidate>
<fileset file="date*.xml"/>
</xmlvalidate>
</target>
</project>
The file attribute of fileset allows you to
specify a series of files with wildcards. If you run this build file, you
will see that Ant validates six XML documents in one step (all XML
documents in the current directory beginning with the name
date).
The xmlvalidate task has several other features worth
mentioning:
lenient="true" means that the task will
only do well-formedness checking.classname and classpathref attributes allow
you to specify a different XML parser than the default and where to find it.<dtd> lets you indicate a
formal public identifier (publicId) attribute as well as the
local whereabouts (location attribute) of a DTD.
|
As I mentioned earlier, Ant is extensible. One way that you can extend
Ant is by writing your own task (
instructions on how to do this are found in the Ant manual). James
Clark has written a task for
Jing that allows you to use Ant to validate XML documents against RELAX NG
schemas, in both XML and compact syntaxes. Jing's source code
is available for download, but for convenience I have included a copy of
JingTask.java in the example archive for easy inspection
(along with a copy of Jing's license).
The document date.xml is valid with regard to the RELAX NG
schema date.rng:
<?xml version="1.0"?>
<element name="date" xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<data type="dateTime"/>
</element>
RELAX NG supports externally defined datatype libraries, such as W3C XML Schema datatypes. The
XML Schema datatype dataTime more precisely defines the valid
content of <date> than just #PCDATA in a DTD. To
validate date.xml against date.rng with Ant, use
the build file build-jing.xml:
<?xml version="1.0"?>
<project default="rng">
<taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask"/>
<target name="rng">
<echo message="Validating RELAX NG schema with Jing..."/>
<jing rngfile="date.rng" file="date.xml"/>
</target>
</project>
The <taskdef> element defines the jing
task, and its classname attribute identifies the class that
executes the task. This class is stored in jing.jar, part of
the Jing distribution. If you place jing.jar in Ant's
lib directory, Ant will be able to find the Jing task.
The echo task echoes the text in message. Jing is
silent upon success, as are other tasks. You can throw in an echo
task to augment what is normally reported.
The jing task's rngfile identifies a RELAX NG
schema, and the file attribute names the instance of the
schema. You can also use a fileset type as a child of
<jing>, allowing you to validate more than one document
at a time.
Jing can also validate against schemas in the
compact syntax, RELAX NG's terse, non-XML format. The compact version reduces
date.rng to one short line in date.rnc:
element date { xsd:dateTime }
Compact syntax processors automatically declare the XML Schema datatype
library with the xsd prefix. The build file
build-rnc.xml validates date.xml against
date.rnc (note the addition of the compactsyntax
attribute):
<?xml version="1.0"?>
<project default="rng">
<taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask"/>
<target name="rng">
<echo message="Validating RELAX NG compact syntax schema with Jing..."/>
<jing compactsyntax="true" rngfile="date.rnc" file="date.xml"/>
</target>
</project>
Kawaguchi Kohsuke is currently developing an Ant task for validators that support the Java API for Relax Verifiers (JARV). This task will work with Sun's Multi-schema Validator and other JARV validators.
This example places targets discussed earlier together into a single
build file and adds a few other targets as well. The resulting file,
build.xml, is an example of a simple XML pipeline. The basic
scenario is that a property is set (the current directory) using a local
XML document (properties.xml) and a remote, zipped file
(date.zip) is downloaded via the get task. The
file, which contains a RELAX NG schema (date.rng), is
unzipped and a local document (date.xml) is validated against
it. Then the same document is validated against a DTD
(date.dtd) and transformed into an HTML document
(date.html). Finally, an e-mail is sent, signaling the
completion of the process. Granted, this is a rather uncomplicated
example, and more complex operations are possible, but this gives you an
idea of how you can put your own pipeline together.
Here is the build file:
<?xml version="1.0"?>
<project default="mail">
<taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask"/>
<target name="init">
<echo message="Load XML properties..."/>
<xmlproperty file="properties.xml"/>
</target>
<target name="get" depends="init">
<get src="http://www.wyeast.net/date.zip" dest="date.zip"/>
</target>
<target name="unzip" depends="get">
<unzip src="date.zip" dest="${build.dir}"/>
</target>
<target name="rng" depends="unzip">
<echo message="Jing validating..."/>
<jing rngfile="date.rng" file="date.xml"/>
</target>
<target name="val" depends="rng">
<xmlvalidate file="date.xml">
<xmlcatalog>
<dtd publicId="-//Wy'east Communications//Date DTD//EN"
location="date.dtd"/>
</xmlcatalog>
</xmlvalidate>
</target>
<target name="xform" depends="val">
<xslt in="date.xml" out="date.html"
style="date.xsl">
<outputproperty name="method" value="xml"/>
<outputproperty name="indent" value="yes"/>
</xslt>
</target>
<target name="mail" depends="xform">
<mail mailhost="mail.example.com" subject="Ant build">
<to address="schlomo@example.com"/>
<from address="hermes@example.com"/>
<message>Complete!</message>
</mail>
</target>
</project>
Before running this example, you should change the values of
mailhost and both the to and from addresses to something that
will work on your own mail server. You will also need to install the JAR
files from the
JavaMail project in Ant's lib directory (though MIME mail
may still not work). To run the build, all you have to do is type:
C:\Java\Ant>ant
Because the build file is named build.xml, Ant
automatically picks it up and runs it. The output will look like this,
provided you have a live Internet connection (for the get and
mail targets), and all files from the example archive are
still in place:
Buildfile: build.xml
init:
[echo] Load XML properties...
get:
[get] Getting: http://www.wyeast.net/date.zip
unzip:
[unzip] Expanding: C:\Java\Ant\date.zip into C:\Java\Ant
rng:
[echo] Jing validating...
val:
[xmlvalidate] 1 file(s) have been successfully validated.
xform:
[xslt] Processing C:\Java\Ant\date.xml to C:\Java\Ant\date.html
[xslt] Loading stylesheet C:\Java\Ant\date.xsl
mail:
[mail] Failed to initialise MIME mail
[mail] Sending email: Ant build
[mail] Sent email with 0 attachments
BUILD SUCCESSFUL
Total time: 7 seconds
Each of the targets except the one named init has a
depends attribute. The value of this attribute establishes a
hierarchy of dependencies between the targets. The default or starting
target is mail (identified in the
<project> element); in order for it to execute, the
xform target must first execute successfully and in order for
xform to execute, val must execute, and so
forth. So this dependency is not established structurally, as through a
parent-child relationship, but rather through attribute values. You can
put the targets in any order in the build file. They will be still execute
according to the order of the values in the depends and
name attributes. These dependencies make up the segments of
the pipeline.
The build file has an xslt target that transforms
date.xml into date.html according to the XSLT
stylesheet date.xsl. The <outputproperty>
children contribute values that would normally be supplied by the
output element of XSLT. (Tony Coates' article deals with the
xslt target extensively, so I'll limit my comments here.)
The xmlvalidate target uses the xmlcatalog
type with a <dtd> child to specify a formal public
identifier for a DTD and the location of a local copy of that DTD. This
type is based on the XML Catalog
specification, an entity and URI resolution initiative from OASIS.
The get target gets a URL source, downloading it to a
specified location. The xmlproperty target reads the file
properties.xml:
<?xml version="1.0"?>
<build>
<dir>.</dir>
</build>
The arbitrary tags in the properties file determine the name or names
for the variable that you can use elsewhere in the build file to reference
values, such as ${build.dir}. The first part of the variable
name comes from the <build> tag and the second part
from <dir>. The content of <dir>
becomes the value of the variable. You can also use
attributes to create property names.
Ant provides logging and event listening facilities. One such
logger-listener is defined in the class
org.apache.tools.ant.XmlLogger, which produces XML output. The
following command line puts the XML logger to work:
C:\Java\Ant>ant -logger org.apache.tools.ant.XmlLogger -v -l log.xml
The -v (or -verbose) option indicates verbose
output, all of which is sent to the log file; the -l option
(or -logfile) provides a name for the log file. You can find
an XSLT stylesheet for log files in the etc directory called
log.xsl. The following figure shows you how
log.xml will appear in a browser after it has been
transformed by log.xsl.
![]() |
| log.xml after being transformed by log.xsl |
I realize that Ant was not intended to be a an XML pipeline tool, but it turns out to be a pretty good one anyway. Other tools exist and may eventually do a better job, such as Sean McGrath's XPipes or Eric van der Vlist's XML Validation Interoperability Framework (XVIF). For now, though, Ant remains an attractive option. Like XML, Ant can do things that perhaps it was not originally intended to do. That's a good sign.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.