DOM for Web Services, Part 3
In the first article of this series I discussed the XML authoring and processing requirements in web services, explained the DOM architecture along with the features in the three DOM levels, and introduced MSXML and Xerces, two popular DOM implementations.
In the second article I showed readers how to use MSXML, especially how to process WSDL files and develop web service user interfaces on the client side using MSXML inside JavaScript code. I also showed the use of MSXML on the server via an ASP.NET page.
In this third and final article of this series I demonstrate the use of Xerces, which is the most popular Java-based implementation of DOM. In this article's first section I develop a couple of Java classes that can create and process SOAP messages. This will demonstrate the basic DOM features of Xerces. In the second section I demonstrate the use of some other important features, including:
The third section contains a discussion of the Load and Save module, an important feature in DOM level 3 which is not yet supported in Xerces.
The last section wraps up this series by explaining the scenarios in which you will most likely use DOM for XML authoring and processing requirements in your web service applications.
Xerces is part of the Apache XML project. It is available for Java and C++. In this article I only cover Xerces for Java, which is commonly called Xerces-J. The most recent version of Xerces-J available at the time of writing is 2.6.
Note that W3C DOM is not the only XML API that Xerces supports. Xerces also supports SAX and a proprietary interface called Xerces Native Interface (XNI). Complete documentation about Xerces is available from the Xerces site. I will discuss only the W3C DOM features of Xerces.
Also note that the Java Web Services Developer Pack (JWSDP) from Sun includes standard XML processing Java APIs, including the Java API for XML Processing (JAXP). The current reference implementation of JAXP uses Xerces as its default XML processing engine. If you download JWSDP from Sun's site, you will get Xerces, and you won't need to download it separately.
However if you are using JDK1.4, you have a small problem to take care
of before starting to use Xerces. JDK1.4 ships with an older version of
Xerces. Even if you include Xerces jars in your classpath, the Java
runtime will use the older version of Xerces and not the one that comes
with JWSDP. The instructions for solving this problem come with the JWSDP
installation. When you install JWSDP under Windows (the latest release
for now is version 1.3), you will see instructions for JDK1.4 users saying
"Create the directory: <JAVA_HOME>\jre\lib\endorsed and
then copy the files in the following directory to the newly created
directory: C:\jwsdp-1.3\jaxp\lib\endorsed".
The files in the C:\jwsdp-1.3\jaxp\lib\endorsed directory
of JWSDP include a Xerces-J jar file named xercesImpl.jar. When you create
the new <JAVA_HOME>\jre\lib\endorsed directory and copy
the files from the C:\jwsdp-1.3\jaxp\lib\endorsed directory
to the newly created location, you are telling the Java runtime to use the
new version of Xerces instead of the old Xerces implementation that comes
as part of JDK1.4.
However, if you don't want to download and install JWSDP, you can
download Xerces
and copy the xercesImpl.jar file into the
<JAVA_HOME>\jre\lib\endorsed directory.
Once you have the xercesImpl.jar file at its correct
place, you will not need to include anything in your classpath to compile
and run the samples of this article. The source
code download contains source and compiled form of all the samples
that we are going to use for demonstration in this article.
Look at Listing
3 of the first article of this series, which was a SOAP message that
we used to describe the usage model of web services. Notice that the SOAP
message contains elements belonging to two XML namespaces. The first is
the SOAP namespace and the second is an application specific namespace
(http://www.cityportal.com).
The use of these namespaces demonstrate that XML and SOAP
specifications allow building layered applications, where the
application-specific layer works on top of the SOAP layer. The SOAP
specification defines the Envelope, Header, and
Body elements and allows applications to define their own
namespaces to fill in the header and the body of a SOAP envelope.
This layered architecture is a great strength of XML web
services. It allows vendors to develop off-the-shelf standard solutions
(e.g. a SOAP client or a SOAP server) and application developers to add
only the application-specific bit of the layered framework. For example,
if you consider the SOAP message of Listing
3 of the first article, you will see that the only application
specific elements are GetCityWeatherReport and
CityName. The rest of the markup is standard SOAP.
We are going to use the same idea of layering application bits. We will have two classes in our sample DOM-based SOAP engine:
The DataWrapper class creates the application-specific
data that go along with the SOAP method call (e.g. the
CityName element in Listing
3 of the first article). The SOAPMessage class creates
the SOAP Envelope along with the SOAP Body. As
a SOAP request usually contains the name of a web service method, so the
same SOAPMessage class will also author the method element
(usually the immediate child of the SOAP Body).
But how do these classes use DOM to create XML?
Look at the add() method in Listing
1, which takes three parameters. The first parameter is the name of
the data element (e.g. CityName in Listing
3 of the first article). The second parameter specifies the namespace
to which the data element belongs. The third parameter specifies the
contents of the data element (e.g. "Karachi" in Listing
3 of the first article). The add() method simply stores
these parameters in a list. An application can call this method any number
of times. Every time an application calls this method, a new set of data
will be added to the items already stored in the list.
The appendAsChildren() method in Listing
1 takes just one parameter named parentElement, which is
a DOM element. The appendAsChildren() method takes all the
entries in the list one by one and adds them as child nodes to the
parentElement.
Notice from Listing
1 that the appendAsChildren() method first calls the
getOwnerDocument() method of the parentElement
object. The getOwnerDocument() method belongs to the DOM
Node interface. It returns the Document object
to which a DOM node belongs. We need to know the owner document whenever
we want to add a child element to an existing element.
After getting the owner Document object, the
appendAsChildren() method performs the following operations
for every entry in the list:
createElementNS() method
of the owner Document object. The
createElementNS() method takes two parameters. The first
parameter is the namespace URI string for the element that you want to
create. The second parameter is the name of the element. The
createElementNS() method returns the newly created
Element object, which represents the name of a parameter that
goes along with a SOAP method invocation request
(e.g. CityName in Listing
3 of the first article).Element as a child to
parentElement by calling the appendChild() method of
parentElement.createTextNode()
method of the owner document object and append the text node as a child to
the newly created Element node. This text node represents the
value of the parameter that goes with a SOAP message call (e.g. "Karachi"
in Listing
3 of the first article). Just for the sake of demonstration, we have written a simple
main() method in Listing
1. The main() method demonstrates how an application will
use the functionality of the add() and
appendAsChildren() methods.
Now have a look at the SOAPMessage constructor in Listing
2. It takes three parameters: methodName,
methodNamespace, and parameters. The
methodName parameter represents the name of the SOAP method
that the SOAP message will invoke on a remote server
(e.g. GetCityWeatherReport in Listing
3 of the first article). The methodNamespace parameter
represents the namespace to which the methodName element belongs
(e.g. "http://www.cityportal.com" in Listing
3 of the first article). The parameters parameter is a
DataWrapper object which wraps all the data that goes with
the SOAP method invocation request.
The SOAPMessage constructor creates a SOAP message. So
you first have to create a new empty XML document. Creating a new XML DOM
document in Xerces takes three steps. You first instantiate a
DocumentBuidlerFactory, then you create a
DocumentBuilder, and then using the
newDocument() method of the DocumentBuilder, you
create a DOM Document object. You will use the
newDocument() method whenever you want to create a new empty
XML DOM document containing no data. The Document object that
the newDocument() returns exposes the DOM
Document interface.
Once you have the DOM document, you can author the root
Envelope element by using the createElementNS()
method discussed earlier.
After creating the Envelope element, you need to attach the
element to its parent. As Envelope is the root element, so
the Document object is its parent. Therefore, you will call
the appendChild() method of the Document object
to attach the Envelope element to the document.
Note that an XML document can have only one root element. That's why you
can attach only one element node to a Document object. If you
try to attach more than one element node, you will get an exception at
runtime.
In a similar manner we have created the Body element (the
bodyElement object), attached it to the Envelope
element, created the SOAP method name element (the
methodElement object), and attached it to the
Body element.
Finally we have to author the elements that represent parameters
associated with the SOAP method invocation request. This is the job of the
appendAsChildren() method of the DataWrapper
class that we have already explained. You will call the
appendAsChildren() method of the parameters object and pass
the methodElement object along with the method call. This
will automatically append the parameters data to the SOAP method call.
Also look at the getSOAPRequestText() method in Listing
2, which was written to demonstrate XML processing in Xerces. It takes
a Document object and returns its XML data in string form. It
uses a method called getElementAsText(), which is recursive
and is responsible for creating the XML data corresponding to the root
element and all its children.
The following points are worth noting from the
getElementAsText() method in Listing
2:
getTagName() method of the
Element object to read the tag name of the element. The tag
name consists of both the prefix and the local name (i.e. if the prefix is
"env" and the local name is "Envelope", the tag
name will be "env:Envelope").getAttributes() method of the
Element object to read all the attributes of an element into
a NamedNodeMap object. A NamedNodeMap object is
used to hold a number of nodes, where each node is accessible by name or
index number. We have used getLength() and
item() methods of the NamedNodeMap interface to
fetch all attribute nodes . The getLength() method returns
the total number of nodes in a NamedNodeMap and the
items() method returns the node at a particular index.getNamespaceURI() method to get the
namespace URI of each element. Recall from earlier discussion that the
createElementNS() method creates an element with a namespace
URI and a tag name. The getNamespaceURI() method returns the
same URI.getPrefix() method to fetch the
namespace prefix of all elements.Node.getNodeType() method tells the type of a node
(e.g. whether a node is a text node or an element node). We have used this
method to differentiate text nodes from element nodes.
The main() method in Listing
2 simulates a simple SOAP application. We have instantiated a
DataWrapper class and called its add() method
once to add one parameter. We have then instantiated a
SOAPMessage object and passed the DataWrapper
object to the SOAPMessage constructor. Listing
3 shows the resulting SOAP message.
This section demonstrates some important DOM features of Xerces that are not covered in the sample SOAP application of the previous section.
Have a look at Listing
4, which is a simple Java class named DOMCopySample.java. The
main() method of this class demonstrates how to copy DOM
nodes from one document into another.
Notice from Listing
4 that we have used the parse() method of the
DocumentBuilder object to load an XML file into the DOM
Document object named sourceDoc. The name of the
file that the parse() method will parse is "inputXML.xml". We
have shown the "inputXML.xml" file in Listing
5, which contains several invoice elements.
Recall that when we were creating the SOAP message document in Listing
2, we used the newDocument() method of the
DocumentBuilder class to create an empty DOM document with no
XML data. You will use the parse() method (instead of the
newDocument() method) when you want to create a DOM document
from an existing XML file or an input data stream containing XML data. The
parse() method parses the input XML data, loads the data into
a DOM Document object, and returns the Document
object.
After loading the XML file into the sourceDoc object, we have
called the getElementsByTagName() method of the
Document object and passed "invoice" as a parameter. The
getElementsByTagName() method belongs to the DOM
Document interface. It takes the name of an element as a
parameter and returns a NodeList object, which contains a
list of all elements in the DOM document that have names matching the
input parameter to the getElementsByTagName() method call.
NodeList is a DOM interface, which exposes the abstract
functionality of a list of nodes. It contains just two methods,
getLength() and item(int index). The
getLength() method returns the number of nodes in the
NodeList and the item(int index) method returns
the node at a particular index.
Some readers may want to compare the NodeList interface with
the NamedNodeMap interface discussed earlier. The main
difference is that you cannot access individual nodes in a
NodeList by names of nodes, while you can do this in a
NamedNodeMap.
After getting the NodeList object in Listing
4, we have created a new empty DOM document object named
targetDoc. We have then created an
invoiceWrapper element, which serves as the root element of
the newly created targetDoc object.
Next we have taken each element in the NodeList and passed it
to the importNode() method of the targetDoc
object. The importNode() method imports a node from one
document into another document. It takes two parameters. The first
parameter is a node which you want to import from some other DOM
document. The second parameter is of boolean type. If the second parameter
is true, the importNode() method will import the node along
with all its child nodes (i.e. the complete tree of nodes whose root
starts at the node being imported). If the second parameter is false, the
importNode() method only imports the node without any of its
children.
After importing the invoice elements from
sourceDoc to targetDoc, we have appended the
imported elements as children of the invoiceWrapper
element. Listing
6 shows how targetDoc looks like after importing all the
invoice nodes of the sourceDoc (the inputXML.xml
file of Listing
5).
|
The DOM level 2 contains a separate specification for events, which can be very helpful in developing XML processing applications. This section demonstrates how to generate and handle DOM events in a Xerces application.
If you want to use DOM events in your Xerces applications, you need to
follow the DOM events architecture. The important components of the DOM
events architecture are three interfaces named EventTarget,
EventListener, and Event.
If a particular DOM implementation supports DOM events, all its nodes will
implement the EventTarget interface. Thus, you can cast any
object that implements the Node interface (or any interface
that extends the Node interface, such as the
Document interface) as an EventTarget object.
If a DOM implementation does not support DOM events and you try to cast
its nodes as EventTarget objects, your application will throw
exceptions as runtime. So you need a mechanism to verify that the DOM
implementation you are using supports DOM events before trying to cast a
node as an EventTarget object.
DOM Level 2 has an interface called DOMImplementation, whose
hasFeature() method can help you check whether a particular
DOM implementation supports a particular DOM feature. You can call the
getDOMImplementation() method of the DocumentBuilder object to get a
DOMImplementation object. You can then call the
hasFeature() method of the DOMImplementation
object to check whether it supports a particular DOM feature.
The hasFeature() method takes two parameters. The first
parameter specifies the name of the feature that you want to verify. The
DOM Level 2 Core specification defines the names of different DOM
features. The name of the events feature is "Events". The second parameter
defines a version number of the feature. For all DOM Level 2 features, you
will pass "2.0" as the second parameter.
Have a look at Listing
7, where we have tested the Events feature by calling the
DOMImplementation.hasFeature("Events","2.0") method, which
returns true (meaning Xerces supports the DOM Level 2 Events feature).
There are various types of events in DOM Level 2, e.g. mutation events, user interface events, mouse events, etc. The current Xerces implementation only supports mutation events. Mutation events are generated whenever a node gets mutated. For example, a mutation event can be generated whenever the value of an attribute in a DOM tree is changed.
Now let's see how you will use mutation events in Xerces. Have a look at Listing 7 and observe the following sequence:
Document object and added one element node to the newly
created document. We have then cast the Document object as an
EventTarget object. As we have used the Document
node as an event target, any node in the document can generate an event
for this target.addEventListener() method of the
EventTarget interface, which adds a listener to the event
target (the Document object). The
addEventListener() method takes three parameters. The first
parameter specifies the type of event you want to generate. There can be
several types of mutation events. For example, the
DOMAttrModified event occurs whenever a DOM attribute value
gets modified. (For a complete list of the possible types of DOM Level 2
mutation events, consult section 1.6.4 of the DOM Level 2 events
specification.) The second parameter specifies an event handler object,
and the third parameter specifies whether the user wants to initiate
capture of an event. We don't want to use this feature, so we have passed
"false" as the third parameter.addEventListerner()
method). In order to write an event handler, you have to implement the
DOM's EventListener interface, which contains just one method
named handleEvent(). The handleEvent() method of
the event handler object will receive control whenever a mutation event
occurs on the Document object that you registered as an event
target. Notice from Listing
7 that we have written an inner class named
MyEventListener, which implements the
handleEvent() method. The handleEvent() method
takes just one parameter, which is an object that implements the
Event interface. The Event object carries
information about the event that occurred and which needs to be handled.
The handleEvent() method will normally call the
Event.getType() method to know the type of event that
occurred. You can check the type of mutation event that occurred and then
take an appropriate event handling action according to the event type that
occurred.
Notice that in the main() method of Listing
7, we have added two new attributes to the DOM Document
object after registering the event listener. Therefore, if you run the
class of Listing
7, the handleEvent() method of the
MyEventListener class will receive control twice (once for
each attribute added).
The concept of events is especially useful when you have a comprehensive XML application in which there are several DOM Documents with many nodes and each node has a possibility of being edited at several places in your business logic. In such cases, you can use the DOM events architecture. You will only have to write the event handling logic without worrying about calling the event handlers yourself. The DOM events framework will take care of calling the event handlers for you at appropriate time.
The concept of having a DOM range allows you to select a number of DOM nodes into a single range of nodes and then process the full range of nodes together in one go. For example, you can select a number of DOM nodes of a DOM document into a range and then import the range into another DOM document. This will import the whole range of nodes into the new document. Let's see how.
Have a look at Listing
8, which first checks whether the DOM implementation being used
supports the "Range" feature and then casts a Document object
as a DocumentRange object. This process is similar to what we
did earlier while trying to use the "Events" feature.
The DocumentRange interface exposes the
createRange() method that you can use to create a new
Range object. The Range interface exposes
methods that you need to use the DOM range feature.
A Range object represents a range of DOM nodes, which starts
at a starting point and ends at an ending point. You can move the starting
and ending points to position your range over the set of nodes of your
choice. In order to move the two points, you will need to use the
different methods of the Range interface.
When a range is initially created, both its starting and ending points are
positioned at the beginning of the document with which the range is
associated. Notice from Listing
8 that after creating a new range, we have created six elements i.e. a
wrapper (the root element) and its five children named e1,
e2, e3, e4, and e5.
Next we have called the setEndBefore() method of the
Range object, passing the e5 element as a
parameter along with the method call. This sets the end of the range
before the e5 element, which means now the range ends at the
e4 element. We have also called the setStartAfter() method of
the Range object and passed the e1 element as a
parameter. This sets the start of the range just after the e1 element,
which means the range now starts at the e2 element. Our range
now covers the e2, e3, and e4
elements.
We can now process the nodes in our range together in one go. For example,
we have called the cloneContents() method of the
Range object, which returns a DocumentFragment
object. A DocumentFragment interface is also part of DOM and
extends the Node interface. It is like a lightweight
document, similar to the Document interface, but with limited
features.
The DocumentFragment object that the
cloneContents() method returns contains a copy of each of the
nodes covered by our Range object. You can directly import
the DocumentFragment object into a new DOM document, just
like importing any other type of node. This will result in importing all
the nodes present in the DocumentFragment into the new
document.
For example, in Listing
8, after calling the cloneContents() method, we have
created a new DOM document, added a wrapper root element, and imported the
DocumentFragment into the newly created document. The newly
created document now contains copies of the e2,
e3, and e4 elements and looks like the XML file
of Listing
9.
Therefore, you can use the concept of DOM Range by first selecting the start and end positions of the range and then performing the operation of your choice on the range that you have selected.
DOM Level 3 includes a load and save module which provides a mechanism
for loading XML data into DOM Document objects and for
serializing DOM Document objects as XML data. Before the DOM
Level 3, there was no such mechanism in DOM. Therefore DOM implementations
used to build proprietary mechanisms for loading and saving.
The DOM Level 3 load and save module is currently under development for. It is not yet part of the standard Xerces download. So we will not demonstrate how to use the load and save module in Xerces. Instead we'll just describe the important interfaces in the load and save module of DOM Level 3.
The primary interface in the DOM load and save module is
DOMImplementationLS, which is meant to extend the features of
the DOMImplementation interface that we saw earlier. You can
check whether a particular DOM implementation supports load and save by
calling the hasFeature("LS","3.0") of any
DOMImplementation instance. In case the
DOMImplementation.hasFeature() method returns true, you can
cast the DOMImplementation object as a
DOMImplementationLS instance.
The DOMImplementation interface contains a method named
createLSInput(), which creates and returns an instance of the
LSInput interface. The LSInput object is capable
of encapsulating XML data in different forms, such as a textual string, a
character stream, or a byte stream. After creating the
LSInput interface, you can set XML data in one of the data
fields of the LSInput interface.
The load and save module also contains an LSParser interface,
which you can instantiate using the createLSParser() method
of the DOMImplementationLS interface. You can call the
parse() method of the LSParser object and pass
on the LSInput object to the parse() method. The
parse() method will return the DOM document representation of
the XML data that you set in the LSInput object, thus
completing the process of loading XML data into a DOM
Document object.
When you want to serialize a DOM document as XML data, you will use the
LSSerializer interface, which you can instantiate using the
createLSSerizlizer() method of the DOMImplementationLS
instance. You can then call the writeToString() method of the
LSSerizlizer object, which takes a DOM Node
(e.g. a Document node) and returns the string representation
of the input DOM Node.
We have discussed many DOM features in this series of articles. We have demonstrated that DOM is a powerful API for low level XML authoring and processing. As web services have gained popularity, many higher level XML processing engines have emerged. These higher level engines normally target specific XML-based languages and schemas. For example, the Microsoft .NET framework contains easy to use features that enable WSDL and SOAP processing in web service applications. Similarly, the JWSDP also contains APIs for XML-based Remote Procedure Calls (RPC). Therefore, it is expected that many developers will prefer using higher-level schema-specific APIs rather than using DOM for low level XML processing.
The following are the two common scenarios where you will likely use DOM in your XML applications:
In this series of articles, I have explained the DOM architecture and demonstrated the use of DOM features in web service applications. I considered MSXML and Xerces, the two popular DOM implementations. I showed DOM working inside a JavaScript page on the client side and inside an ASP.NET page on the server side. And I discussed how to use the DOM features of Xerces in Java-XML applications.
Resources
|
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.