Using XML Catalogs with JAXP
XML documents often refer to other documents that an XML processor has to retrieve in order to make sense of the main document. These external resources, typically referred to by URIs, may be local files; or they may be remote, distributed across the web. In an ideal world the difference would be invisible, since it would be as cheap to access a remote resource as a local one. However, in the real world network failures do occur, and it is wise to design applications that take this into account.
XML Catalogs offer a way to manage local copies of public DTDs, schemas, or indeed any XML resource that exists outside of the referring XML instance document. Rather than modifying the XML instance document to refer directly to a local copy, you leave the reference to the remote resource and write an XML Catalog that maps remote references to local resources. Your application then installs a resolver, whose job it is to consult the catalog whenever an external resource is needed. The Apache xml-commons project's Resolver package, from Norman Walsh, is a collection of Java classes for working with XML Catalogs. This article looks at how to use the Resolver classes with JAXP by working through three XML processing examples that cover the main capabilities of XML Catalogs.
XML Catalogs is currently an OASIS Committee Specification, which is a draft specification on track to becoming an OASIS Standard. It is a direct descendent of work done on catalogs for SGML systems, the current standard being the OASIS Technical Resolution TR9401 plain-text catalog format. This standard can also be used for XML applications; indeed the xml-commons Resolver supports TR9401 catalogs too, although they are not covered in this article.
Example 1: Offline Validation of XHTML Pages
For the first example, let's look at a common situation where XML Catalogs are useful: in providing a local copy of a DTD. Suppose you want to check that a page is valid XHTML -- before you put it on your website, for example. Here's a sample XHTML page to be checked:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>OMELETTE</title> </head> <body> <h1>Omelette by Elizabeth David</h1> <h2>Ingredients</h2> <ul> <li>3-4 eggs</li> <li>1/2 oz. butter</li> <li>Salt and pepper</li> </ul> <h2>Method</h2> <p>Beat the eggs...</p> </body> </html>
The obvious way to perform the check from Java would be to use an event-based parser, such as the JAXP SAX parser shown here:
SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware(true); factory.setValidating(true); SAXParser parser = factory.newSAXParser(); XMLReader reader = parser.getXMLReader(); reader.setErrorHandler(new DefaultErrorHandler()); reader.parse(inputSource);
DefaultErrorHandler is an implementation of
that prints warnings to standard error, and throws exceptions when errors
or fatal errors occur during parsing. Since the parser is validating
the XHTML document against the declared DOCTYPE, it will retrieve the
DTD from W3C's site at http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd.
(It is worth noting as an aside that the DTD may be retrieved even if
the parser is not validating, as this
part of the XML spec explains.) For some applications this might
not be a problem, but others might not have the luxury of a permanent
net connection -- a J2ME Connected Limited Device Configuration, for
instance. Even if a net connection is available it might be slow, causing
the page checker to be unacceptably slow; or the resource might not
be available (if W3C's site is down), causing the page checker to break.
We can solve all these potential problems by using an catalog. A catalog is made up of one or more catalog entry files. Here is the simplest catalog entry file, called catalog.xml, that can be used to resolve the public identifier for an XHTML document to a local copy of the XHTML 1.0 DTD:
<?xml version="1.0" encoding="UTF-8"?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <public publicId="-//W3C//DTD XHTML 1.0 Strict//EN" uri="catalog/xhtml1-strict.dtd"/> </catalog>
A catalog entry file is made up of a number of catalog entries. This
one has a single
public entry that describes a mapping
between the public identifier of an entity -- in this case -//W3C//DTD
XHTML 1.0 Strict//EN -- and a preferred URI to locate the entity
-- in this case the file catalog/xhtml1-strict.dtd relative to
catalog.xml. You need to manually download the DTD (and the referenced
external entity files for XHTML) and put it in the correct local directory;
the catalog simply provides the mapping, it doesn't provide automatic
To plug the catalog into our application we need to use the Apache
xml-commons project's Resolver component. For a JAXP application,
the key class is
an implementation of
as the name suggests is the interface JAXP parsers use to customize
handling of external entities. To register the resolver, call the
method on the SAX
XMLReader instance, passing in a new
CatalogResolver. (Similarly, in the case of
a JAXP DOM parser, the
CatalogResolver is set on the
SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware(true); factory.setValidating(true); SAXParser parser = factory.newSAXParser(); XMLReader reader = parser.getXMLReader(); reader.setEntityResolver(new CatalogResolver()); reader.setErrorHandler(new DefaultErrorHandler()); reader.parse(inputSource);
But how does the
CatalogResolver find XML Catalog entry
files? One way to configure this is by setting the system property
to a semicolon-separated list of catalog entry files; by passing a command-line
property to the Java Virtual Machine, for example
However, using an absolute path is best avoided since it restricts the
portability of your application. Web applications, for instance, should
be written in such a way as not to depend on where they are deployed
on the filesystem, as this is typically out of their control.
A better way to specify catalogs is to provide a properties file with
a relative path to the catalog entry files.
CatalogManager class that automatically looks for
a properties file called CatalogManager.properties on the classpath.
The following properties file achieves the same effect as setting the
# Catalogs are relative to this properties file relative-catalogs=false # Catalog list catalogs=catalog.xml
Notice that the property
relative-catalogs is set to
which may seem a little counter intuitive. If
is set to
true then the filenames that appear in the
property are left unchanged, so a relative path will be relative to
the current directory of the JVM. On the other hand, if set to
relative paths are made absolute with respect to the CatalogManager.properties
file. A full list of properties and their behavior is fully described
in the API
Finally, we can run the page checker application offline since the
will use the local catalog to load the DTD. To prove that no net connection
is required, I have written a JUnit test that runs with a security manager
that blocks all net access. This test, along with all the other examples
in this article, is available in the download.
Example 2: W3C XML Schema Validation
In the same way that an XML document may associate itself with a DTD
DOCTYPE declaration, an XML document may associate
itself with a W3C XML Schema using a schema location hint. This
example looks at how to validate a document against a schema specified
in this way.
A schema location hint is an
on an element -- typically the root -- whose value is a list of namespace
URIs and URIs for the schemas to validate elements in those namespaces.
Alternatively, if the elements are not in a namespace, a schema location
hint is an
xsi:noNamespaceSchemaLocation attribute whose
value is a URI for the schema. The
xsi prefix is bound
to the http://www.w3.org/2001/XMLSchema-instance namespace URI.
For example, here is an XML instance document that describes a recipe, and declares itself to be valid with respect to the schema located at http://tiling.org/xmlcatalogs/schemas/recipe.xsd in the http://tiling.org/xmlcatalogs/namespaces/recipe namespace:
<?xml version="1.0" encoding="UTF-8"?> <recipe xmlns="http://tiling.org/xmlcatalogs/namespaces/recipe" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://tiling.org/xmlcatalogs/namespaces/recipe http://tiling.org/xmlcatalogs/schemas/recipe.xsd"> <author>Elizabeth David</author> <name>Omelette</name> <ingredient>3-4 eggs</ingredient> <ingredient>1/2 oz. butter</ingredient> <ingredient>Salt and pepper</ingredient> <method>Beat the eggs...</method> </recipe>
Although not explicitly marked as a system identifier we can use a catalog
system element to associate the schema with a local
<?xml version="1.0" encoding="UTF-8"?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <system systemId="http://tiling.org/xmlcatalogs/schemas/recipe.xsd" uri="catalog/recipe.xsd"/> </catalog>
Then we can use the same JAXP SAX code as before -- with one important
change -- to validate the XML instance document using the local schema.
The only change needed is to tell JAXP which schema language to use
when performing validation. In this case it is W3C XML Schema, which
is configured by setting a property on the
show below. Note that if the JAXP parser you are using does not implement
specification version 1.2 or later, then attempting to set the property
will fail by throwing an
is worth mentioning in passing that for a DOM parser you set the same
property name and value by calling the
SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware(true); factory.setValidating(true); SAXParser parser = factory.newSAXParser(); parser.setProperty( "http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema" ); XMLReader reader = parser.getXMLReader(); reader.setEntityResolver(new CatalogResolver()); reader.setErrorHandler(new DefaultErrorHandler()); reader.parse(inputSource);
Another benefit that catalogs offer, in addition to protection from
network failure, is the ability to substitute a public resource with
a local one that better fits your particular application's needs. For
example, in the case of schema validation, it might be useful to validate
against a local schema that imposes stronger constraints than the public
one. Another way of achieving this effect -- but only in the case of
schema validation -- is by explicitly instructing the parser to validate
against a given schema; effectively overriding the schema location hint.
Just set the property
to a value specifying the schema to use. This is explained in detail
in the JAXP
1.2 maintanence specification.
Example 3: Remote Stylesheet Inclusions
For the third example of catalogs in action, we turn to XSLT transforms
and see how one stylesheet can include or import another. The
instruction, which the XSLT processor replaces with the contents of
the referenced stylesheet, allows stylesheet authors to split stylesheets
into modular documents. For example, the following skeleton stylesheet
for transforming the recipe XML file in the previous section into XHTML
includes a set of public XSLT utilities.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns="http://www.w3.org/1999/xhtml" xmlns:r="http://tiling.org/xmlcatalogs/namespaces/recipe" exclude-result-prefixes="r"> <xsl:include href="http://tiling.org/xmlcatalogs/xslt/utils.xslt"/> ... <xsl:template match="r:recipe"> ... </xsl:template> </xsl:stylesheet>
This time the catalog uses a
uri element to specify the
match for the included file reference:
<?xml version="1.0" encoding="UTF-8"?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <uri name="http://tiling.org/xmlcatalogs/xslt/utils.xslt" uri="catalog/utils.xslt"/> </catalog>
JAXP provides an interface called
that allows applications to intercept calls to the
xsl:import instruction, and the
CatalogResolver implements this interface too,
using the URI mappings from its catalog to resolve resources. So in
the transform code we simply call the
method on the
TransformerFactory, passing in an instance
CatalogResolver. Then we can create a new
instance, and it will be set up to use the local file utils.xslt.
TransformerFactory factory = TransformerFactory.newInstance(); factory.setURIResolver(new CatalogResolver()); Transformer transformer = factory.newTransformer(stylesheetSource); StringWriter writer = new StringWriter(); StreamResult result = new StreamResult(writer); transformer.transform(inputStreamSource, result);
Developing More Complex Catalogs
XML Catalogs offer several other useful features. For instance, you
a match to another catalog; and you can chain
catalogs together using the
nextCatalog element. Also useful
is the ability to map a set of mirrored resources using a single rewrite
entry, as the following catalog entry file illustrates.
<?xml version="1.0" encoding="UTF-8"?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <rewriteSystem systemIdStartString="http://tiling.org/xmlcatalogs/schemas/" rewritePrefix="catalog/"/> </catalog>
rewriteSystem instructs the resolver to replace the
start string for any matching system identifier with the given prefix.
In this case, all schemas that begin with the string http://tiling.org/xmlcatalogs/schemas/
are mirrored in the local directory catalog/ relative to the
catalog entry file.
The XML Catalogs we have seen so far have each consisted of just a single
entry file with a single entry. An XML Catalog can be made up of a list
of catalog entry files, each considered in turn, although subsequent
files are not consulted if a match is found in an earlier file. Within
each catalog entry file there are rules that govern resolution -- for
a full list, see the specification.
system entries are considered for matching
When developing larger catalogs an identifier may not be resolved to
the URI you expect. It can pay to write unit tests that test resolution,
perhaps by restricting net access (like the examples
that accompany this article). Even with tests, however, diagnostic tools
can be useful. The simplest way to see what is going on during resolution
is to set the
to a non-zero number: the higher the number the more information you
You can manually try resolution from the command line using the resolver application that is supplied in the Resolver package. The following session shows resolution of an XHTML DOCTYPE, such as the one in the first example at the beginning of this article.
$ java -jar lib/resolver.jar -c catalog.xml \ -p "-//W3C//DTD XHTML 1.0 Strict//EN" \ -s http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd doctype Cannot find CatalogManager.properties Resolve DOCTYPE (name, publicid, systemid): public id: -//W3C//DTD XHTML 1.0 Strict//EN system id: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd Result: file:/tom/workspace/xmlcatalogs/catalog/xhtml1-strict.dtd
Using XML Catalogs to manage a local store of external resources can make your JAXP applications more robust and faster by removing the dependency on the network. Furthermore, XML Catalogs is a standard with ever increasing support -- for example, the recently released Ant 1.6 supports XML Catalogs -- so it is easy to reuse your catalog entry files.
- Using XML Catalogs with JAXP - Transforming an XML that has an entity with a system id
2004-08-20 09:34:28 BanSha