XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Why XML is Meant for Java?

June 16, 1999

Exploring the XML/Java Connection

This article was originally published in Web Techniques magazine, July 1999 issue which covered XML and Java.

A close relationship between XML and Java has existed since the early days of the XML effort. One of the first public statements about this relationship came from Sun Microsystems' Jon Bosak, chair of the XML Working Group. He said, "XML gives Java something to do." But it works the other way as well: Java lets XML do something useful.

XML by itself is just a lot of text; you need a program to manipulate that text and make things happen. Up until now, Java has been the language of choice for writing those programs. But has this just been a marriage of convenience? Will Java soon be supplanted by other languages, such as Perl and Python, traditionally used for text manipulation? Or will Java remain the language of choice, even as the other two compete for attention?

I'll examine three important aspects of the Java/XML relationship. First, I'll look at the actual reasons for Java's success with XML, and if there is some hidden affinity between them. Then, I'll briefly discuss two popular programming models for manipulating XML in Java. If you use Java for manipulating XML, you'll probably choose one of these. Thirdly, there are a couple of significant developments on the horizon: the decision to provide a standard Java API for manipulating XML and the current work on developing a next-generation schema language for XML -- both of which bolster the Java/XML relationship. Finally, I'll say a word or two on the future of this alliance.

Let Us Count the Ways

There are several reasons for XML's success with Java, some of which may translate into long-term advantages, and some of which may not. There's a philosophical connection, a common sensibility that makes these two languages natural allies. Some of the reasons are social, a set of characteristics shared by the people who first started working with XML that made them also choose Java. Then, perhaps most importantly in the long term, some of the reasons are architectural -- common language traits that make the two languages work well together.

A Shared Philosophy

Any examination of Java and XML would need mention of two very important shared characteristics. First, both languages were explicitly designed to be used in distributed systems. XML was conceived as SGML for the Web, a sort of reverse takeover from HTML. The SGML crowd had generally been dismissive of HTML from the start (the feeling was mutual) and developers were anxious to apply the lessons of HTML's success to SGML as a whole. Java was not initially intended for the Web but for delivery over networks to a variety of devices. Its feature set was easily adapted to the Web.

Second, both languages are simplifications of powerful beasts whose complexity had gotten out of control. In the case of Java, the beast was C++, which grew from a relatively straightforward object-oriented extension of C to a complex language with support for several complex types of inheritance and templates. In the case of XML, the beast was SGML. XML can be viewed as a subset of SGML; it eliminates several syntactic and lexical quirks in SGML that complicate processing. Many of these SGML features were intended to be used as shortcuts for writing documents. In designing XML, many of these were deliberately thrown away.

If nothing else, these two points would make it natural to consider using the two languages together. Both languages could be seen as refining proven ideas.

A Common Social Context

When the early history of XML is written, right after Jon Bosak and the XML SIG will come the stalwarts of the XML-DEV email list. XML-DEV was started by Peter Murray-Rust, a British professor of chemistry, to jump-start the development of XML software. Murray-Rust is also the developer of Jumbo, probably the world's first well-known application to use XML. Jumbo displays and edits documents written in the Chemical Markup Language (CML).

XML-DEV rapidly became the center of a growing community of people interested in doing something useful with XML. Naturally, these people wanted to write programs and there was a need to share code. There are obvious advantages when developers agree to use one language, and Java was a logical choice. Moreover, many of the people attracted to XML were predisposed to trying new things, and when XML arrived, they were still trying Java. It seemed obvious to apply Java (already billed as the programming language for the Web) to XML (now billed as the markup language of the Web). In my own case, I deliberately learned both at the same time.

After converging on Java, XML-DEV led the development of the Simple API for XML (SAX), the most widely used API for XML. SAX, in fact, started with a request of Murray-Rust, who was growing tired of adapting Jumbo to work with all the available parsers. David Megginson, who had written the Aelfred parser, took over the lead in developing actual interfaces. Now, of course, all publicly available Java parsers, even those from the largest vendors, support the SAX API, even when they have a different API of their own.

Other Java software has also come out of the XML-DEV group, including Michael Kay's SAXON, an XSL-type application engine for manipulating XML, and John Cowen's SAXDOM, an application that will walk a DOM tree and behave like a SAX parser. Given the role of the XML-DEV community, developers trying to popularize their XML software will want to attract the support of this group, and doing so means using Java.

Architectural Affinities

Although XML was designed to be completely independent of any development language, several of Java's features make it a particularly good choice. One such feature is Unicode support. Most programming languages use ASCII to represent strings, but ASCII is hopelessly anglocentric. Unicode, on the other hand, is truly the alphabet of all alphabets, with some 39,000 built-in letters and plenty of room for expansion.

Choosing Unicode for XML was the right thing to do, but only one popular programming language was designed to use Unicode from the bottom up, and that was Java. This, at least initially, decreased the utility of Perl as an XML language of choice. Much of the attraction of Perl is its powerful regular expressions, which exploit the features of ASCII, and which were initially helpless in the face of Unicode. To make Perl a suitable XML processing language, Larry Wall extended Perl's regular expressions to work with Unicode as UTF-8 (or Unicode Transformation Format).

Java also supported a number of features that made it really easy to share the code needed to build software that supported XML. The most important features are the package structure, dynamic class loading, and the JavaBeans API.

The package structure is perhaps the one feature of Java that makes it easiest to share code with others. All Java classes fit into a very regular structure that follows the typical UNIX or Windows file system, so the class com.myPackage.MyClass is the file com/myPackage/MyClass.java. If you accept this structure, and map it directly into your file system, you can easily receive code from another party, place it into your file system in the appropriate location, and run the Java compiler on it. This is a significant simplification over mechanisms provided by previous languages in the C/C++ family, which often required a fair amount of arcane Makefile skill to compile. Having downloaded and compiled many different pieces of software from the Internet over the last decade or so (including Linux when it required only 10 floppies), I can attest to how difficult it can be to get a stand-alone product running. Even more difficult is producing software that depends upon work from other developers. Java solves these kinds of problems if you follow its rules. It also provides a place for every Java class on the Internet without name clashes. It's a little loss of freedom for greater flexibility and safety -- kind of like traffic lights.

Dynamic class loading refers to the ability of a Java class to be loaded by request at run time. This can be done either implicitly by having the runtime system load a class file the first time it creates an object of that class, or explicitly by a call to Class.forName() with the name of the desired class. Java applications with access to the network, such as applets, can be shipped with a minimum configuration and additional components retrieved as desired.

A third item that ties XML and Java together is JavaBeans technology. Beans are the Java equivalent of just plain old data structures in C. Each JavaBean has a set of properties that clients can get or set. Properties can be either single objects or arrays. Beans are useful for expressing XML because they can have a straightforward data model, but can be subclassed to exhibit specific behaviors. We'll see JavaBeans again shortly when we consider XML tree-walking.

To tie this all together, consider the following scenario. Suppose you're developing a distributed game in which each participant has a local client, and they communicate by sending XML messages to each other. By creating the messages in XML, you have effectively freed the clients from being tied to one programming language or the other. One client could be in Perl, the next in Java, and the next in your favorite XML editor. Although the underlying messaging architecture -- XML and HTTP -- doesn't require one language or the other, building your clients with Java has definite advantages.

You can build your client as an applet, ensuring the ability to display it easily in a browser. There is no inherent reason other languages (such as Perl or SmallTalk) couldn't be the language for applets, but they're not. Using the language that's the standard gives your code access to far more desktops. Furthermore, designing your application as a set of beans allows it to be used with any tool that understands the JavaBean API.

The client can be loaded in parts, depending on what functionality is required, based on the XML messages the client receives. At startup, some minimal functionality is downloaded, especially the GUI. Then the XML messages start flying. If we assume a simple one-to-one mapping between elements and Java classes, as is quite common, then for each element we instantiate an object of the corresponding type. With a traditional system, we'd need to have all the classes for all the element types linked in before we started the application, but with Java we can start with only the code to locate the appropriate class. Each time a new element appears, we find the appropriate class, load it over the Net if necessary, and create an instance. We don't even need to know the name of the class in advance. Suppose I own "myshop.com" and have a document type called purchaseOrder that has an element LineItem. I can put the LineItem class in the package com.myshop.purchaseOrder, so the class itself is globally visible as com.myshop.purchaseOrder.LineItem. When my applet sees a LineItem element, it creates the text string for the class from its knowledge of my packet structure and the element name, loads the class com.myshop.purchaseOrder.LineItem from my server, and then creates a LineItem object from it. The client didn't need to know anything about LineItems in particular to get started.

The final wrinkle is that I can always add a new document type with a totally new mapping mechanism by having a second level of mapping from a document type to a class that knows how to load classes specifically for the element types in that document.

Pages: 1, 2

Next Pagearrow