Why XML is Meant for Java (Part 2)
June 16, 1999
Application Models
Now that we've concluded that Java and XML are a perfect match, how do you write code? There are two basic approaches to building an XML application, all easily available through Java:
- tree-walking: The document is parsed into a tree, and the application walks around the tree looking for interesting nodes.
- event-based: Each time the parser detects an interesting event, it calls some method on one or more application objects. This can be further subdivided into APIs using simple callbacks and APIs using event objects.
The Tree-Walking Technique
The de facto and de jure standard for tree walking is the Document Object Model (DOM) API. The DOM is an official W3C Recommendation. It is officially language independent, but some of the first implementations were in Java. The DOM is designed to cover both XML and HTML, and it corresponds roughly to the in-memory parse tree of the XML (or HTML) document.
As an in-memory representation, a DOM tree consists of a tree of Node
objects.
The root Node
is of type Document
. This Document has an
Element
child, which itself can have several Element
and
Text
children. The various other things one finds in an XML document, such as
processing instructions and entities, also can be children of Document
and
Element
nodes. Programs can walk over this tree asking elements about their
attributes, children, and so on. DOM trees, however, are extremely fine-grained. The
DOM API
doesn't know anything about any particular document type, so it makes no effort to
simplify
access to information in documents of that document type.
Tree-walking is a very popular technique for processing information. Its use with XML is not limited to the DOM. One common approach, which precedes the DOM and was already found in Jumbo, is to attach a Java class to each element type appearing in the document. As each element is parsed, an instance of the appropriate Java class is instantiated. These form a tree corresponding to the parse tree, but with additional information and behaviors. These objects are naturally expressed as JavaBeans. In the current world, however, it is best to make these classes also DOM classes, so they can be processed by code expecting a DOM tree as well as code that understands the new classes.
A more recent twist on this approach is to build more complex objects based on a document type definition (DTD). Examining the DTD reveals how elements relate to each other. These relationships can then be used to generate classes with methods for getting and setting the children of an element directly.
The Event-Stream Model
The event approach is immediately clear to anyone who has done much GUI programming. In the GUI world, everything hangs on user input. The whole application is built around a tight loop, called the "event loop," which gets each user event, analyzes it, and calls the right piece of code. The code does some computation and returns to the event loop. Java itself has gone through two generations of the event model.
The first used a simple callback mechanism -- when an event occurred, an application
method
was called with some number of parameters giving information about the event. The
de facto
standard for implementing this style of application in the world of XML is SAX. In
SAX,
there are two basic types of objects, Parser
s and
DocumentHandler
s. Your application must implement DocumentHandler
.
Before parsing starts, you call setDocumentHandler
on your parser, passing in a
reference to your application. During the parse of the document, the parser will call
the
methods of DocumentHandler
that your application implements, such as
startElement
and endElement
. It will pass information like the
element name and attribute values.
The "real" event-based processing model, and what I consider the most Java-like of the models, is the one introduced with the Java 1.1 JDK for the Abstract Window Toolkit (the old Java GUI toolkit). Instead of using a callback mechanism, similar to SAX, all interesting information is encapsulated in a variety of event objects. Certain classes of objects, such as parsers, generate events. Other classes of objects that are interested in events, such as GUI constructors or pattern matchers, register themselves as listeners. When an event occurs, an object encapsulating information on the event is sent to all registered listeners. In the case of XML parsing, when an event such as an open tag is encountered, the "start element" event is created and sent to any registered listeners. This model is used extensively in the 1.1 AWT, throughout the Swing GUI toolkit, and in the architecture of JavaBeans.
As with the callback model, using events to build a DOM-style tree is a straightforward process. However, the more interesting aspect of this paradigm is the break it maintains between objects generating events and those receiving them. Events easily pass asynchronously from thread to thread. It's simple to configure multiple listeners to "share the burden" of processing a document. For example, one listener can be in charge of validating addresses in a purchase order, while another deals with verifying quantities for line items, and a third is in charge of formatting. Each of these listeners is in charge of a specific function and each can be developed separately.
By decoupling the transfer of information from function calling, the use of event objects also simplifies building complex applications. Different parts of an application can proceed at different speeds in a multithreaded way; where producers and consumers of events can't proceed synchronously, event objects can be placed in queues that each party can access at its own rate. So, in the example, waiting for an address to be verified or a quantity to be reserved doesn't slow down the process of interpreting the style sheet. All three "agents" access the information at their desired rates. Event objects persist until garbage collection removes them.
This model demonstrates its greatest strength when combined with a variety of filters
and
dispatchers that sit between the components of an application. These filters and dispatchers
are both event listeners and event generators. A filter receives an event, potentially
performs some operation on it (such as transforming it or even ignoring it), and passes
the
changed event on to its recipients. Filters can be chained together to be the XML
equivalent
of UNIX pipes. One very straightforward filter is an Architectural Forms
filter, which takes startElement
and endElement
events and
switches the element name with the name of a particular attribute.
Future Developments
Two new developments will positively impact the relationship between Java and XML. These are part of Sun's decision to standardize a set of Java APIs for manipulating XML and new work from the W3C for developing a new schema language for XML.
The relationship between Java and XML has become sufficiently strong that Sun has decided to develop a set of standard Java APIs for XML, a Java Platform Standard Extension, using the Java Community Process. In the first stage, JavaSoft queries the community as to the need for standardizing an extension in a particular area. Having completed this stage, JavaSoft is now pressing on with the next stage, assembling a group of XML experts to determine the shape of the APIs. Although this is truly in the early stages, it's a good guess that both SAX and the DOM will make the cut.
The largest impediment to a closer relationship between Java and XML is the lack of any standard inheritance mechanism for describing classes of XML elements, but this might not be far off. The next step in the development of XML will be a schema language to replace DTDs. This work is underway in the W3C's Schema Working Group. While it is far too early to predict the exact form of the schema language, the group has recently published its requirements, which explicitly call for an inheritance mechanism. In addition, two of the public submissions, XML-Data and the Schema for Object-Oriented XML (SOX), have inheritance mechanisms, while the Document Content Description for XML (DCD) submission talks about eventual extensions for inheritance. So it looks likely some kind of inheritance will become officially part of XML before the millennium.
Conclusions
Java and XML are partners for the long term. They have reciprocal duties in enabling the Web of the future. Java will become the brains of the Internet; XML documents will be how they speak to each other.
Despite the apparent premise of this article, and possibly because I know many sane people who manipulate XML using Perl and Python, I do not wish to convey the impression that Java is the only language to use for XML. While it was an early front-runner, Perl and Python, traditional stalwarts of the old SGML crowd, have recaptured some lost ground; both are vying to become the XML "scripting language of choice."
Nevertheless, when it comes to building distributed systems around the Net, the languages of choice will be XML and Java.
This article was originally published in Web Techniques magazine, July 1999 issue, which covered XML and Java. The URL for the original article on the Web Techniques site is: http://www.webtechniques/1999/06/fuchs/.
Additional Web Techniques articles on XML:
XML Development in Java by Maneesh Sahu. | JavaBeans makes it easy to write XML applications. Maneesh shows you how to build one, and what's required to process XML documents in a Java program. |
Anatomy of an XML Server by Bob Bickel | You're probably quite familiar with the design of a Web server. Bob acquaints you with XML servers and how they make it easier to write XML applications. |
SQL-Based XML Structured Data Access by Michael M. David | XML's strengths are in representing heirarchical information while SQL is better at processing data in rows and columns. Michael helps you understand how to combine the two. |