DOM for Web Services, Part 3
January 6, 2004
In the first article of this series I discussed the XML authoring and processing requirements in web services, explained the DOM architecture along with the features in the three DOM levels, and introduced MSXML and Xerces, two popular DOM implementations.
In the second article I showed readers how to use MSXML, especially how to process WSDL files and develop web service user interfaces on the client side using MSXML inside JavaScript code. I also showed the use of MSXML on the server via an ASP.NET page.
In this third and final article of this series I demonstrate the use of Xerces, which is the most popular Java-based implementation of DOM. In this article's first section I develop a couple of Java classes that can create and process SOAP messages. This will demonstrate the basic DOM features of Xerces. In the second section I demonstrate the use of some other important features, including:
- Working with multiple XML documents in which you need to import XML nodes from one document into another.
- The use of Xerces to generate DOM events, and writing your own event handlers to handle the events generated.
- The use of DOM range and DOM document fragments. The DOM range specification provides an easy to use method for grouped processing of several XML nodes.
The third section contains a discussion of the Load and Save module, an important feature in DOM level 3 which is not yet supported in Xerces.
The last section wraps up this series by explaining the scenarios in which you will most likely use DOM for XML authoring and processing requirements in your web service applications.
W3C DOM and Xerces
Xerces is part of the Apache XML project. It is available for Java and C++. In this article I only cover Xerces for Java, which is commonly called Xerces-J. The most recent version of Xerces-J available at the time of writing is 2.6.
Note that W3C DOM is not the only XML API that Xerces supports. Xerces also supports SAX and a proprietary interface called Xerces Native Interface (XNI). Complete documentation about Xerces is available from the Xerces site. I will discuss only the W3C DOM features of Xerces.
Also note that the Java Web Services Developer Pack (JWSDP) from Sun includes standard XML processing Java APIs, including the Java API for XML Processing (JAXP). The current reference implementation of JAXP uses Xerces as its default XML processing engine. If you download JWSDP from Sun's site, you will get Xerces, and you won't need to download it separately.
However if you are using JDK1.4, you have a small problem to take care of before
starting
to use Xerces. JDK1.4 ships with an older version of Xerces. Even if you include Xerces
jars
in your classpath, the Java runtime will use the older version of Xerces and not the
one
that comes with JWSDP. The instructions for solving this problem come with the JWSDP
installation. When you install JWSDP under Windows (the latest release for now is
version
1.3), you will see instructions for JDK1.4 users saying "Create the directory:
<JAVA_HOME>\jre\lib\endorsed
and then copy the files in the following
directory to the newly created directory: C:\jwsdp-1.3\jaxp\lib\endorsed
".
The files in the C:\jwsdp-1.3\jaxp\lib\endorsed
directory of JWSDP include a
Xerces-J jar file named xercesImpl.jar. When you create the new
<JAVA_HOME>\jre\lib\endorsed
directory and copy the files from the
C:\jwsdp-1.3\jaxp\lib\endorsed
directory to the newly created location, you
are telling the Java runtime to use the new version of Xerces instead of the old Xerces
implementation that comes as part of JDK1.4.
However, if you don't want to download and install JWSDP, you can download Xerces and copy the xercesImpl.jar
file into the <JAVA_HOME>\jre\lib\endorsed
directory.
Once you have the xercesImpl.jar
file at its correct place, you will not need
to include anything in your classpath to compile and run the samples of this article.
The source code
download contains source and compiled form of all the samples that we are going to use
for demonstration in this article.
Xerces for SOAP authoring
Look at Listing 3
of the first article of this series, which was a SOAP message that we used to describe
the usage model of web services. Notice that the SOAP message contains elements belonging
to
two XML namespaces. The first is the SOAP namespace and the second is an application
specific namespace (http://www.cityportal.com
).
The use of these namespaces demonstrate that XML and SOAP specifications allow building
layered applications, where the application-specific layer works on top of the SOAP
layer.
The SOAP specification defines the Envelope
, Header
, and
Body
elements and allows applications to define their own namespaces to fill
in the header and the body of a SOAP envelope.
This layered architecture is a great strength of XML web services. It allows vendors
to
develop off-the-shelf standard solutions (e.g. a SOAP client or a SOAP server) and
application developers to add only the application-specific bit of the layered framework.
For example, if you consider the SOAP message of Listing 3 of the first
article, you will see that the only application specific elements are
GetCityWeatherReport
and CityName
. The rest of the markup is
standard SOAP.
We are going to use the same idea of layering application bits. We will have two classes in our sample DOM-based SOAP engine:
The DataWrapper
class creates the application-specific data that go along
with the SOAP method call (e.g. the CityName
element in Listing 3 of the first
article). The SOAPMessage
class creates the SOAP Envelope
along with the SOAP Body
. As a SOAP request usually contains the name of a web
service method, so the same SOAPMessage
class will also author the method
element (usually the immediate child of the SOAP Body
).
But how do these classes use DOM to create XML?
Look at the add()
method in Listing 1, which
takes three parameters. The first parameter is the name of the data element (e.g.
CityName
in Listing 3 of the first
article). The second parameter specifies the namespace to which the data element
belongs. The third parameter specifies the contents of the data element (e.g.
"Karachi" in Listing 3 of the first
article). The add()
method simply stores these parameters in a list. An
application can call this method any number of times. Every time an application calls
this
method, a new set of data will be added to the items already stored in the list.
The appendAsChildren()
method in Listing 1 takes
just one parameter named parentElement
, which is a DOM element. The
appendAsChildren()
method takes all the entries in the list one by one and
adds them as child nodes to the parentElement
.
Notice from Listing
1 that the appendAsChildren()
method first calls the
getOwnerDocument()
method of the parentElement
object. The
getOwnerDocument()
method belongs to the DOM Node
interface. It
returns the Document
object to which a DOM node belongs. We need to know the
owner document whenever we want to add a child element to an existing element.
After getting the owner Document
object, the appendAsChildren()
method performs the following operations for every entry in the list:
- Create a new element using the
createElementNS()
method of the ownerDocument
object. ThecreateElementNS()
method takes two parameters. The first parameter is the namespace URI string for the element that you want to create. The second parameter is the name of the element. ThecreateElementNS()
method returns the newly createdElement
object, which represents the name of a parameter that goes along with a SOAP method invocation request (e.g.CityName
in Listing 3 of the first article). - Append the newly created DOM
Element
as a child to parentElement by calling theappendChild()
method of parentElement. - Create a new text node by calling the
createTextNode()
method of the owner document object and append the text node as a child to the newly createdElement
node. This text node represents the value of the parameter that goes with a SOAP message call (e.g. "Karachi" in Listing 3 of the first article).
Just for the sake of demonstration, we have written a simple main()
method in
Listing 1. The
main()
method demonstrates how an application will use the functionality of
the add()
and appendAsChildren()
methods.
Now have a look at the SOAPMessage
constructor in Listing 2. It
takes three parameters: methodName
, methodNamespace
, and
parameters
. The methodName
parameter represents the name of the
SOAP method that the SOAP message will invoke on a remote server (e.g.
GetCityWeatherReport
in Listing 3 of the first
article). The methodNamespace
parameter represents the namespace to which
the methodName element belongs (e.g. "http://www.cityportal.com" in Listing 3 of the first
article). The parameters
parameter is a DataWrapper
object
which wraps all the data that goes with the SOAP method invocation request.
The SOAPMessage
constructor creates a SOAP message. So you first have to
create a new empty XML document. Creating a new XML DOM document in Xerces takes three
steps. You first instantiate a DocumentBuidlerFactory
, then you create a
DocumentBuilder
, and then using the newDocument()
method of the
DocumentBuilder
, you create a DOM Document
object. You will use
the newDocument()
method whenever you want to create a new empty XML DOM
document containing no data. The Document
object that the
newDocument()
returns exposes the DOM Document
interface.
Once you have the DOM document, you can author the root Envelope
element by
using the createElementNS()
method discussed earlier.
After creating the Envelope
element, you need to attach the element to its
parent. As Envelope
is the root element, so the Document
object is
its parent. Therefore, you will call the appendChild()
method of the
Document
object to attach the Envelope
element to the document.
Note that an XML document can have only one root element. That's why you can attach
only
one element node to a Document
object. If you try to attach more than one
element node, you will get an exception at runtime.
In a similar manner we have created the Body
element (the
bodyElement
object), attached it to the Envelope
element,
created the SOAP method name element (the methodElement
object), and attached
it to the Body
element.
Finally we have to author the elements that represent parameters associated with
the SOAP
method invocation request. This is the job of the appendAsChildren()
method of
the DataWrapper
class that we have already explained. You will call the
appendAsChildren()
method of the parameters object and pass the
methodElement
object along with the method call. This will automatically
append the parameters data to the SOAP method call.
Also look at the getSOAPRequestText()
method in Listing 2, which
was written to demonstrate XML processing in Xerces. It takes a Document
object
and returns its XML data in string form. It uses a method called
getElementAsText()
, which is recursive and is responsible for creating the
XML data corresponding to the root element and all its children.
The following points are worth noting from the getElementAsText()
method in
Listing 2:
- We have used the
getTagName()
method of theElement
object to read the tag name of the element. The tag name consists of both the prefix and the local name (i.e. if the prefix is "env
" and the local name is "Envelope
", the tag name will be "env:Envelope
"). - We have used the
getAttributes()
method of theElement
object to read all the attributes of an element into aNamedNodeMap
object. ANamedNodeMap
object is used to hold a number of nodes, where each node is accessible by name or index number. We have usedgetLength()
anditem()
methods of theNamedNodeMap
interface to fetch all attribute nodes . ThegetLength()
method returns the total number of nodes in aNamedNodeMap
and theitems()
method returns the node at a particular index. - We have used the
getNamespaceURI()
method to get the namespace URI of each element. Recall from earlier discussion that thecreateElementNS()
method creates an element with a namespace URI and a tag name. ThegetNamespaceURI()
method returns the same URI. - We have used the
getPrefix()
method to fetch the namespace prefix of all elements. - The
Node.getNodeType()
method tells the type of a node (e.g. whether a node is a text node or an element node). We have used this method to differentiate text nodes from element nodes.
The main()
method in Listing 2
simulates a simple SOAP application. We have instantiated a DataWrapper
class
and called its add()
method once to add one parameter. We have then
instantiated a SOAPMessage
object and passed the DataWrapper
object to the SOAPMessage
constructor. Listing 3 shows
the resulting SOAP message.
Some Important DOM Features
This section demonstrates some important DOM features of Xerces that are not covered in the sample SOAP application of the previous section.
Copying DOM Nodes from one document into another
Have a look at Listing 4, which is a simple Java class named DOMCopySample.java. The
main()
method of this class demonstrates how to copy DOM nodes from one
document into another.
Notice from Listing
4 that we have used the parse()
method of the
DocumentBuilder
object to load an XML file into the DOM Document
object named sourceDoc
. The name of the file that the parse()
method will parse is "inputXML.xml". We have shown the "inputXML.xml" file in Listing 5, which
contains several invoice
elements.
Recall that when we were creating the SOAP message document in Listing 2, we used
the newDocument()
method of the DocumentBuilder
class to create an
empty DOM document with no XML data. You will use the parse()
method (instead
of the newDocument()
method) when you want to create a DOM document from an
existing XML file or an input data stream containing XML data. The parse()
method parses the input XML data, loads the data into a DOM Document
object,
and returns the Document
object.
After loading the XML file into the sourceDoc
object, we have called the
getElementsByTagName()
method of the Document
object and passed
"invoice" as a parameter. The getElementsByTagName()
method belongs to the DOM
Document
interface. It takes the name of an element as a parameter and
returns a NodeList
object, which contains a list of all elements in the DOM
document that have names matching the input parameter to the
getElementsByTagName()
method call.
NodeList
is a DOM interface, which exposes the abstract functionality of a list
of nodes. It contains just two methods, getLength()
and item(int
index)
. The getLength()
method returns the number of nodes in the
NodeList
and the item(int index)
method returns the node at a
particular index.
Some readers may want to compare the NodeList
interface with the
NamedNodeMap
interface discussed earlier. The main difference is that you
cannot access individual nodes in a NodeList
by names of nodes, while you can
do this in a NamedNodeMap
.
After getting the NodeList
object in Listing 4, we have
created a new empty DOM document object named targetDoc
. We have then created
an invoiceWrapper
element, which serves as the root element of the newly
created targetDoc
object.
Next we have taken each element in the NodeList
and passed it to the
importNode()
method of the targetDoc
object. The
importNode()
method imports a node from one document into another document.
It takes two parameters. The first parameter is a node which you want to import from
some
other DOM document. The second parameter is of boolean type. If the second parameter
is
true, the importNode()
method will import the node along with all its child
nodes (i.e. the complete tree of nodes whose root starts at the node being imported).
If the
second parameter is false, the importNode()
method only imports the node
without any of its children.
After importing the invoice
elements from sourceDoc
to
targetDoc
, we have appended the imported elements as children of the
invoiceWrapper
element. Listing 6 shows
how targetDoc
looks like after importing all the invoice
nodes of
the sourceDoc
(the inputXML.xml file of Listing 5).
Working with DOM events
The DOM level 2 contains a separate specification for events, which can be very helpful in developing XML processing applications. This section demonstrates how to generate and handle DOM events in a Xerces application.
If you want to use DOM events in your Xerces applications, you need to follow the
DOM
events architecture. The important components of the DOM events architecture are three
interfaces named EventTarget
, EventListener
, and
Event
.
If a particular DOM implementation supports DOM events, all its nodes will implement
the
EventTarget
interface. Thus, you can cast any object that implements the
Node
interface (or any interface that extends the Node
interface, such as the Document
interface) as an EventTarget
object.
If a DOM implementation does not support DOM events and you try to cast its nodes
as
EventTarget
objects, your application will throw exceptions as runtime. So
you need a mechanism to verify that the DOM implementation you are using supports
DOM events
before trying to cast a node as an EventTarget
object.
DOM Level 2 has an interface called DOMImplementation
, whose
hasFeature()
method can help you check whether a particular DOM
implementation supports a particular DOM feature. You can call the getDOMImplementation()
method of the DocumentBuilder object to get a DOMImplementation
object. You can
then call the hasFeature()
method of the DOMImplementation
object
to check whether it supports a particular DOM feature.
The hasFeature()
method takes two parameters. The first parameter specifies
the name of the feature that you want to verify. The DOM Level 2 Core specification
defines
the names of different DOM features. The name of the events feature is "Events". The
second
parameter defines a version number of the feature. For all DOM Level 2 features, you
will
pass "2.0" as the second parameter.
Have a look at Listing 7, where we have tested the Events feature by calling the
DOMImplementation.hasFeature("Events","2.0")
method, which returns true
(meaning Xerces supports the DOM Level 2 Events feature).
There are various types of events in DOM Level 2, e.g. mutation events, user interface events, mouse events, etc. The current Xerces implementation only supports mutation events. Mutation events are generated whenever a node gets mutated. For example, a mutation event can be generated whenever the value of an attribute in a DOM tree is changed.
Now let's see how you will use mutation events in Xerces. Have a look at Listing 7 and observe the following sequence:
- After verifying the support of Events feature, we have created a new
Document
object and added one element node to the newly created document. We have then cast theDocument
object as anEventTarget
object. As we have used theDocument
node as an event target, any node in the document can generate an event for this target. - Next we have called the
addEventListener()
method of theEventTarget
interface, which adds a listener to the event target (theDocument
object). TheaddEventListener()
method takes three parameters. The first parameter specifies the type of event you want to generate. There can be several types of mutation events. For example, theDOMAttrModified
event occurs whenever a DOM attribute value gets modified. (For a complete list of the possible types of DOM Level 2 mutation events, consult section 1.6.4 of the DOM Level 2 events specification.) The second parameter specifies an event handler object, and the third parameter specifies whether the user wants to initiate capture of an event. We don't want to use this feature, so we have passed "false" as the third parameter. - Finally, we have to write an event handler class (whose instance we passed as the
second
parameter to the
addEventListerner()
method). In order to write an event handler, you have to implement the DOM'sEventListener
interface, which contains just one method namedhandleEvent()
. ThehandleEvent()
method of the event handler object will receive control whenever a mutation event occurs on theDocument
object that you registered as an event target.
Notice from Listing
7 that we have written an inner class named MyEventListener
, which
implements the handleEvent()
method. The handleEvent()
method
takes just one parameter, which is an object that implements the Event
interface. The Event
object carries information about the event that occurred
and which needs to be handled.
The handleEvent()
method will normally call the Event.getType()
method to know the type of event that occurred. You can check the type of mutation
event
that occurred and then take an appropriate event handling action according to the
event type
that occurred.
Notice that in the main()
method of Listing 7, we have
added two new attributes to the DOM Document
object after registering the event
listener. Therefore, if you run the class of Listing 7, the
handleEvent()
method of the MyEventListener
class will receive
control twice (once for each attribute added).
The concept of events is especially useful when you have a comprehensive XML application in which there are several DOM Documents with many nodes and each node has a possibility of being edited at several places in your business logic. In such cases, you can use the DOM events architecture. You will only have to write the event handling logic without worrying about calling the event handlers yourself. The DOM events framework will take care of calling the event handlers for you at appropriate time.
A range of DOM nodes and DOM document fragments
The concept of having a DOM range allows you to select a number of DOM nodes into a single range of nodes and then process the full range of nodes together in one go. For example, you can select a number of DOM nodes of a DOM document into a range and then import the range into another DOM document. This will import the whole range of nodes into the new document. Let's see how.
Have a look at Listing 8, which first checks whether the DOM implementation being used supports the
"Range" feature and then casts a Document
object as a
DocumentRange
object. This process is similar to what we did earlier while
trying to use the "Events" feature.
The DocumentRange
interface exposes the createRange()
method
that you can use to create a new Range
object. The Range
interface
exposes methods that you need to use the DOM range feature.
A Range
object represents a range of DOM nodes, which starts at a starting
point and ends at an ending point. You can move the starting and ending points to
position
your range over the set of nodes of your choice. In order to move the two points,
you will
need to use the different methods of the Range
interface.
When a range is initially created, both its starting and ending points are positioned
at
the beginning of the document with which the range is associated. Notice from Listing 8 that
after creating a new range, we have created six elements i.e. a wrapper (the root
element)
and its five children named e1
, e2
, e3
,
e4
, and e5
.
Next we have called the setEndBefore()
method of the Range
object, passing the e5
element as a parameter along with the method call. This
sets the end of the range before the e5
element, which means now the range ends
at the e4
element. We have also called the setStartAfter() method of the
Range
object and passed the e1 element as a parameter. This sets the start of
the range just after the e1 element, which means the range now starts at the e2
element. Our range now covers the e2
, e3
, and e4
elements.
We can now process the nodes in our range together in one go. For example, we have
called
the cloneContents()
method of the Range
object, which returns a
DocumentFragment
object. A DocumentFragment
interface is also
part of DOM and extends the Node
interface. It is like a lightweight document,
similar to the Document
interface, but with limited features.
The DocumentFragment
object that the cloneContents()
method
returns contains a copy of each of the nodes covered by our Range
object. You
can directly import the DocumentFragment
object into a new DOM document, just
like importing any other type of node. This will result in importing all the nodes
present
in the DocumentFragment
into the new document.
For example, in Listing 8, after calling the cloneContents()
method, we have created a
new DOM document, added a wrapper root element, and imported the
DocumentFragment
into the newly created document. The newly created document
now contains copies of the e2
, e3
, and e4
elements
and looks like the XML file of Listing 9.
Therefore, you can use the concept of DOM Range by first selecting the start and end positions of the range and then performing the operation of your choice on the range that you have selected.
The Load and Save module in DOM Level 3
DOM Level 3 includes a load and save module which provides a mechanism for loading
XML
data into DOM Document
objects and for serializing DOM Document
objects as XML data. Before the DOM Level 3, there was no such mechanism in DOM. Therefore
DOM implementations used to build proprietary mechanisms for loading and saving.
The DOM Level 3 load and save module is currently under development for. It is not yet part of the standard Xerces download. So we will not demonstrate how to use the load and save module in Xerces. Instead we'll just describe the important interfaces in the load and save module of DOM Level 3.
The primary interface in the DOM load and save module is DOMImplementationLS
,
which is meant to extend the features of the DOMImplementation
interface that
we saw earlier. You can check whether a particular DOM implementation supports load
and save
by calling the hasFeature("LS","3.0")
of any DOMImplementation
instance. In case the DOMImplementation.hasFeature()
method returns true, you
can cast the DOMImplementation
object as a DOMImplementationLS
instance.
The DOMImplementation
interface contains a method named
createLSInput()
, which creates and returns an instance of the
LSInput
interface. The LSInput
object is capable of
encapsulating XML data in different forms, such as a textual string, a character stream,
or
a byte stream. After creating the LSInput
interface, you can set XML data in
one of the data fields of the LSInput
interface.
The load and save module also contains an LSParser
interface, which you can
instantiate using the createLSParser()
method of the
DOMImplementationLS
interface. You can call the parse()
method
of the LSParser
object and pass on the LSInput
object to the
parse()
method. The parse()
method will return the DOM document
representation of the XML data that you set in the LSInput
object, thus
completing the process of loading XML data into a DOM Document
object.
When you want to serialize a DOM document as XML data, you will use the
LSSerializer
interface, which you can instantiate using the
createLSSerizlizer() method of the DOMImplementationLS
instance. You can then
call the writeToString()
method of the LSSerizlizer
object, which
takes a DOM Node
(e.g. a Document
node) and returns the string
representation of the input DOM Node
.
When should I use DOM?
We have discussed many DOM features in this series of articles. We have demonstrated that DOM is a powerful API for low level XML authoring and processing. As web services have gained popularity, many higher level XML processing engines have emerged. These higher level engines normally target specific XML-based languages and schemas. For example, the Microsoft .NET framework contains easy to use features that enable WSDL and SOAP processing in web service applications. Similarly, the JWSDP also contains APIs for XML-based Remote Procedure Calls (RPC). Therefore, it is expected that many developers will prefer using higher-level schema-specific APIs rather than using DOM for low level XML processing.
The following are the two common scenarios where you will likely use DOM in your XML applications:
- New XML-based protocols are currently under development. Protocols for XML-based transactions are an example. As new protocols emerge, it will take a bit of time for corresponding higher level APIs and processing engines to appear and mature. DOM will help you in protocol-specific XML authoring and processing during this transient phase of XML-protocol development.
- In addition to protocol specific processing, you will also need DOM for the processing of application-specific XML data in SOAP applications. We have already discussed application-specific namespaces in SOAP messages at the start of the "Xerces for SOAP authoring" section. Many industry specific XML schemas for different requirements (such as invoices, work orders, purchase orders, shipping documentation, payment information, product catalogs, etc.) are expected to emerge and be layered over the SOAP framework. Therefore, it is likely that you will be using high-level protocol-specific engines for the processing of standard markup and low level XML APIs like DOM for the processing of industry-specific XML namespaces.
Summary
In this series of articles, I have explained the DOM architecture and demonstrated the use of DOM features in web service applications. I considered MSXML and Xerces, the two popular DOM implementations. I showed DOM working inside a JavaScript page on the client side and inside an ASP.NET page on the server side. And I discussed how to use the DOM features of Xerces in Java-XML applications.
Resources
|