Menu

Why XML is Meant for Java (Part 2)

June 16, 1999

Matthew Fuchs

Application Models

Now that we've concluded that Java and XML are a perfect match, how do you write code? There are two basic approaches to building an XML application, all easily available through Java:

  • tree-walking: The document is parsed into a tree, and the application walks around the tree looking for interesting nodes.
  • event-based: Each time the parser detects an interesting event, it calls some method on one or more application objects. This can be further subdivided into APIs using simple callbacks and APIs using event objects.

The Tree-Walking Technique

The de facto and de jure standard for tree walking is the Document Object Model (DOM) API. The DOM is an official W3C Recommendation. It is officially language independent, but some of the first implementations were in Java. The DOM is designed to cover both XML and HTML, and it corresponds roughly to the in-memory parse tree of the XML (or HTML) document.

As an in-memory representation, a DOM tree consists of a tree of Node objects. The root Node is of type Document. This Document has an Element child, which itself can have several Element and Text children. The various other things one finds in an XML document, such as processing instructions and entities, also can be children of Document and Element nodes. Programs can walk over this tree asking elements about their attributes, children, and so on. DOM trees, however, are extremely fine-grained. The DOM API doesn't know anything about any particular document type, so it makes no effort to simplify access to information in documents of that document type.

Tree-walking is a very popular technique for processing information. Its use with XML is not limited to the DOM. One common approach, which precedes the DOM and was already found in Jumbo, is to attach a Java class to each element type appearing in the document. As each element is parsed, an instance of the appropriate Java class is instantiated. These form a tree corresponding to the parse tree, but with additional information and behaviors. These objects are naturally expressed as JavaBeans. In the current world, however, it is best to make these classes also DOM classes, so they can be processed by code expecting a DOM tree as well as code that understands the new classes.

A more recent twist on this approach is to build more complex objects based on a document type definition (DTD). Examining the DTD reveals how elements relate to each other. These relationships can then be used to generate classes with methods for getting and setting the children of an element directly.

The Event-Stream Model

The event approach is immediately clear to anyone who has done much GUI programming. In the GUI world, everything hangs on user input. The whole application is built around a tight loop, called the "event loop," which gets each user event, analyzes it, and calls the right piece of code. The code does some computation and returns to the event loop. Java itself has gone through two generations of the event model.

The first used a simple callback mechanism -- when an event occurred, an application method was called with some number of parameters giving information about the event. The de facto standard for implementing this style of application in the world of XML is SAX. In SAX, there are two basic types of objects, Parsers and DocumentHandlers. Your application must implement DocumentHandler. Before parsing starts, you call setDocumentHandler on your parser, passing in a reference to your application. During the parse of the document, the parser will call the methods of DocumentHandler that your application implements, such as startElement and endElement. It will pass information like the element name and attribute values.

The "real" event-based processing model, and what I consider the most Java-like of the models, is the one introduced with the Java 1.1 JDK for the Abstract Window Toolkit (the old Java GUI toolkit). Instead of using a callback mechanism, similar to SAX, all interesting information is encapsulated in a variety of event objects. Certain classes of objects, such as parsers, generate events. Other classes of objects that are interested in events, such as GUI constructors or pattern matchers, register themselves as listeners. When an event occurs, an object encapsulating information on the event is sent to all registered listeners. In the case of XML parsing, when an event such as an open tag is encountered, the "start element" event is created and sent to any registered listeners. This model is used extensively in the 1.1 AWT, throughout the Swing GUI toolkit, and in the architecture of JavaBeans.

As with the callback model, using events to build a DOM-style tree is a straightforward process. However, the more interesting aspect of this paradigm is the break it maintains between objects generating events and those receiving them. Events easily pass asynchronously from thread to thread. It's simple to configure multiple listeners to "share the burden" of processing a document. For example, one listener can be in charge of validating addresses in a purchase order, while another deals with verifying quantities for line items, and a third is in charge of formatting. Each of these listeners is in charge of a specific function and each can be developed separately.

By decoupling the transfer of information from function calling, the use of event objects also simplifies building complex applications. Different parts of an application can proceed at different speeds in a multithreaded way; where producers and consumers of events can't proceed synchronously, event objects can be placed in queues that each party can access at its own rate. So, in the example, waiting for an address to be verified or a quantity to be reserved doesn't slow down the process of interpreting the style sheet. All three "agents" access the information at their desired rates. Event objects persist until garbage collection removes them.

This model demonstrates its greatest strength when combined with a variety of filters and dispatchers that sit between the components of an application. These filters and dispatchers are both event listeners and event generators. A filter receives an event, potentially performs some operation on it (such as transforming it or even ignoring it), and passes the changed event on to its recipients. Filters can be chained together to be the XML equivalent of UNIX pipes. One very straightforward filter is an Architectural Forms filter, which takes startElement and endElement events and switches the element name with the name of a particular attribute.

Future Developments

Two new developments will positively impact the relationship between Java and XML. These are part of Sun's decision to standardize a set of Java APIs for manipulating XML and new work from the W3C for developing a new schema language for XML.

The relationship between Java and XML has become sufficiently strong that Sun has decided to develop a set of standard Java APIs for XML, a Java Platform Standard Extension, using the Java Community Process. In the first stage, JavaSoft queries the community as to the need for standardizing an extension in a particular area. Having completed this stage, JavaSoft is now pressing on with the next stage, assembling a group of XML experts to determine the shape of the APIs. Although this is truly in the early stages, it's a good guess that both SAX and the DOM will make the cut.

The largest impediment to a closer relationship between Java and XML is the lack of any standard inheritance mechanism for describing classes of XML elements, but this might not be far off. The next step in the development of XML will be a schema language to replace DTDs. This work is underway in the W3C's Schema Working Group. While it is far too early to predict the exact form of the schema language, the group has recently published its requirements, which explicitly call for an inheritance mechanism. In addition, two of the public submissions, XML-Data and the Schema for Object-Oriented XML (SOX), have inheritance mechanisms, while the Document Content Description for XML (DCD) submission talks about eventual extensions for inheritance. So it looks likely some kind of inheritance will become officially part of XML before the millennium.

Conclusions

Java and XML are partners for the long term. They have reciprocal duties in enabling the Web of the future. Java will become the brains of the Internet; XML documents will be how they speak to each other.

Despite the apparent premise of this article, and possibly because I know many sane people who manipulate XML using Perl and Python, I do not wish to convey the impression that Java is the only language to use for XML. While it was an early front-runner, Perl and Python, traditional stalwarts of the old SGML crowd, have recaptured some lost ground; both are vying to become the XML "scripting language of choice."

Nevertheless, when it comes to building distributed systems around the Net, the languages of choice will be XML and Java.


This article was originally published in Web Techniques magazine, July 1999 issue, which covered XML and Java. The URL for the original article on the Web Techniques site is: http://www.webtechniques/1999/06/fuchs/.

Additional Web Techniques articles on XML:

XML Development in Java by Maneesh Sahu. JavaBeans makes it easy to write XML applications. Maneesh shows you how to build one, and what's required to process XML documents in a Java program.
Anatomy of an XML Server by Bob Bickel You're probably quite familiar with the design of a Web server. Bob acquaints you with XML servers and how they make it easier to write XML applications.
SQL-Based XML Structured Data Access by Michael M. David XML's strengths are in representing heirarchical information while SQL is better at processing data in rows and columns. Michael helps you understand how to combine the two.