The Evolution of JAXP
by Rahul Srivastava
|
Pages: 1, 2
XML Transformation Using the TrAX APIs in JAXP
W3C XSL-T defines transformation rules to transform a source tree into a result tree. A transformation expressed in XSL-T is called a stylesheet. To transform an XML document using JAXP, you need to create a Transformer using the stylesheet. Once a Transformer is created, it takes the XML input to be transformed as a JAXP Source, and returns the transformed result as a JAXP Result. There are three types of sources and results that JAXP provides: StreamSource, SAXSource, DOMSource and StreamResult, SAXResult, DOMResult, which can be used in any combination for transformation.

Figure: XML Transformation
For example, to generate SAX events from DOM:
//parse the XML file to a W3C DOM
DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
dbfactory.setNamespaceAware(true);
DocumentBuilder domparser = dbfactory.newDocumentBuilder();
Document doc = domparser.parse(new File("data.xml"));
//prepare the DOM source
Source xmlsource = new DOMSource(doc);
//create a content handler to handle the SAX events
ContentHandler handler = new MyHandler();
//prepare a SAX result using the content handler
Result result = new SAXResult(handler);
//create a transformer factory
TransformerFactory xfactory = TransformerFactory.newInstance();
//create a transformer
Transformer xformer = xfactory.newTransformer();
//transform to raise the SAX events from DOM
xformer.transform(xmlsource, result);
In the above example, we haven't used any XSL while creating the Transformer. This means the Transformer would merely pour the XML from the Source to the Result without any transformation. When you want to actually transform using a XSL, then you should create the Transformer using the XSL source as follows:
//create the xsl source
Source xslsource = new StreamSource(new File("mystyle.xsl"));
//create the transformer using the xsl source
Transformer xformer = xfactory.newTransformer(xslsource);
What's New in JAXP 1.3
Apart from supporting SAX parsing, DOM parsing, validation against DTD/XMLSchema while parsing, transformation using XSL-T, from the previous versions, JAXP 1.3 additionally supports:
- XML 1.1 and Namespaces in XML 1.1
- XML Inclusions - XInclude 1.0
- Validation of instance against preparsed schema (XMLSchema and RELAX-NG).
- Evaluating XPath expressions.
- XML/Java type mappings for those datatypes in XMLSchema 1.0, XPath 2.0 and XQuery 1.0 for which there wasn't any XML/Java mappings earlier.
Using JAXP 1.3
XML 1.1 and XInclude 1.0
Major things supported in XML 1.1 are:
- forward compatibility for the ever-growing Unicode character set.
- addition of NEL (#x85) and the Unicode line separator character (#x2028) to the list of line-end characters.
Changes in XML 1.1 are not fully backward compatible with XML 1.0 and also break the well-formedness rules defined in XML 1.0. Therefore, a new specification, XML 1.1, was proposed rather than simply updating the existing XML 1.0 specification.
To use XML 1.1 and the Namespaces in XML 1.1 feature, you must set the value of the version attribute, in the XML declaration prolog, of your XML document, to "1.1." For example:
<?xml version="1.1" encoding="UTF-8" standalone="yes"?>
XInclude allows an XML document to include other XML documents. For example:
<myMainXMLDoc xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="fragment.xml"/>
...
</myMainXMLDoc>
To allow XML inclusions, you must set the XInclude feature on the appropriate factory as follows:
DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
dbfactory.setXIncludeAware(true);
Validating a JAXP Source Against a Preparsed Schema
javax.xml.validation package provides support for parsing a schema, and validating XML instance documents against those preparsed schemas. A JAXP DOMSource or a SAXSource can be validated against a preparsed schema. The preparsed schema can be cached for optimization, if required. Note that the JAXP StreamSource is not supported and that the schema can be either a W3C XML Schema or an OASIS RELAX-NG. For example:
//parse an XML in non-validating mode and create a DOMSource
DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
dbfactory.setNamespaceAware(true);
dbfactory.setXIncludeAware(true);
DocumentBuilder parser = dbfactory.newDocumentBuilder();
Document doc = parser.parse(new File("data.xml"));
DOMSource xmlsource = new DOMSource(doc);
//create a SchemaFactory for loading W3C XML Schemas
SchemaFactory wxsfactory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
//set the errorhandler for handling errors in schema itself
wxsfactory.setErrorHandler(schemaErrorHandler);
//load a W3C XML Schema
Schema schema = wxsfactory.newSchema(new File("myschema.xsd"));
// create a validator from the loaded schema
Validator validator = schema.newValidator();
//set the errorhandler for handling validation errors
validator.setErrorHandler(validationErrorHandler);
//validate the XML instance
validator.validate(xmlsource);
Evaluating XPath Expressions
javax.xml.xpath package provides support for executing XPath expressions against a given XML document. The XPath expressions can be compiled for performance reasons, if it is to be reused.
By the way, the XPath APIs in JAXP are designed to be stateless, which means every time you want to evaluate an XPath expression, you also need to pass in the XML document. Often, many XPath expressions are evaluated against a single XML document. In such a case, it would have been better if the XPath APIs in JAXP were made stateful by passing the XML document once. The underlying implementation would then have had a choice of storing the XML source in an optimized fashion (say, a DTM) for faster evaluation of XPath expressions.
An example to evaluate the XPath expressions against the following XML document:
<?xml version="1.0"?>
<employees>
<employee>
<name>e1</name>
</employee>
<employee>
<name>e2</name>
</employee>
</employees>
//parse an XML to get a DOM to query
DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
dbfactory.setNamespaceAware(true);
dbfactory.setXIncludeAware(true);
DocumentBuilder parser = dbfactory.newDocumentBuilder();
Document doc = parser.parse(new File("data.xml"));
//get an XPath processor
XPathFactory xpfactory = XPathFactory.newInstance();
XPath xpathprocessor = xpfactory.newXPath();
//set the namespace context for resolving prefixes of the Qnames
//to NS URI, if the xpath expresion uses Qnames. XPath expression
//would use Qnames if the XML document uses namespaces.
//xpathprocessor.setNamespaceContext(NamespaceContext nsContext);
//create XPath expressions
String xpath1 = "/employees/employee";
XPathExpression employeesXPath = xpathprocessor.compile(xpath1);
String xpath2 = "/employees/employee[1]";
XPathExpression employeeXPath = xpathprocessor.compile(xpath2);
String xpath3 = "/employees/employee[1]/name";
XPathExpression empnameXPath = xpathprocessor.compile(xpath3);
//execute the XPath expressions
System.out.println("XPath1="+xpath1);
NodeList employees = (NodeList)employeesXPath.evaluate(doc,
XPathConstants.NODESET);
for (int i=0; i<employees.getLength(); i++) {
System.out.println(employees.item(i).getTextContent());
}
System.out.println("XPath2="+xpath2);
Node employee = (Node)employeeXPath.evaluate(doc, XPathConstants.NODE);
System.out.println(employee.getTextContent());
System.out.println("XPath3="+xpath3);
String empname = empnameXPath.evaluate(doc);
System.out.println(empname);
XML/Java-type Mappings
Datatypes in XMLSchema 1.0 are quite exhaustive and popular, and are used by many other XML specifications as well, like XPath, XQuery, WSDL, etc... Most of these datatypes naturally map to the primitive or wrapper datatypes in Java. The rest of the datatypes like dateTime, duration, etc., can be mapped to the new Java types: javax.xml.datatype.XMLGregorianCalendar, javax.xml.datatype.Duration, and javax.xml.namespace.QName. Thus, along with the new datatypes defined in javax.xml.datatype package, all the datatypes supported in XMLSchema 1.0, XPath 2.0 and XQuery 1.0 now have an equivalent datatype mapping in Java.
But, the datatype support would have been much better from a usability perspective if the DatatypeFactory had methods to get a Java object for the given WXS datatype, which has methods to constrain the datatypes using facets, and validate a value against the datatype.
An example using Oracle's XDK:
import oracle.xml.parser.schema.*;
. . .
//create a simpleType object
XSDSimpleType st = XSDSimpleType.getPrimitiveType(XSDSimpleType.iSTRING);
//set a constraining facet on the simpleType
st.setFacet(XSDSimpleType.LENGTH, "5");
//validate value
st.validateValue("hello");
Changing the Underlying Implementation
A JAXP implementation comes with a default parser, transformer, xpath engine, and a schema validator, but, as mentioned earlier, JAXP is a pluggable API, and we can plug in any JAXP complaint processor to change the defaults. To do that we must set the appropriate javax.xml.xxx.yyyFactory property pointing to the fully qualified class name of the new yyyFactory. Then, when yyyFactory.newInstance() is invoked, JAXP uses the following ordered lookup procedure to determine the implementation class to load:
- Use the javax.xml.xxx.yyyFactory system property.
- Use the properties file "lib/jaxp.properties" in the JRE directory. The jaxp.properties file is read only once by the JAXP 1.3 implementation and its values are then cached for future use. If the file does not exist when the first attempt is made to read from it, no further attempts are made to check for its existence. It is not possible to change the value of any property in jaxp.properties after it has been read for the first time.
- Use the Services API (as detailed in the JAR specification), if available, to determine the classname. The Services API will look for the classname in the file META-INF/services/javax.xml.xxx.yyyFactory in jars available to the runtime.
- Use the platform default javax.xml.xxx.yyyFactory instance
where javax.xml.xxx.yyyFactory can be one of the following:
javax.xml.parsers.SAXParserFactory
javax.xml.parsers.DocumentBuilderFactory
javax.xml.transform.TransformerFactory
javax.xml.xpath.XPathFactory
javax.xml.validation.SchemaFactory:schemaLanguage (schemaLanguage is
the parameter passed to the newInstance method of SchemaFactory)
For example, to plug in a JAXP complaint SAX parser, say Apache's Xerces, you must set the property javax.xml.parsers.SAXParserFactory to org.apache.xerces.jaxp.SAXParserFactoryImpl, in any of the four ways mentioned above. One of the ways is shown below:
java -Djavax.xml.parsers.SAXParserFactory=
org.apache.xerces.jaxp.SAXParserFactoryImpl MyApplicationProgram
- A very useful article
2005-10-04 11:37:51 random_ - A very useful article
2005-10-04 11:32:46 random_ - "surface" and "abstract"
2005-07-07 17:38:19 Eric Schwarzenbach - "surface" and "abstract"
2005-07-11 00:31:08 RahulS - "surface" and "abstract"
2005-08-04 18:00:48 Eric Schwarzenbach - "surface" and "abstract"
2005-08-07 05:41:34 RahulS - Very informative article
2005-07-07 01:26:56 jdeegan - Very informative article
2005-07-11 00:26:24 RahulS