Getting Started with XOM
Elliotte Rusty Harold's new XML Object Model ( XOM) is a simple, tree-based API for XML, written in Java. XOM attempts to build on good ideas from other Java XML APIs -- SAX, DOM, and JDOM -- and to leave behind some of their frustrations. The result is a high-level open-source API that is easy to learn and use, assuming that you are already familiar with Java and XML.
Unlike SAX, XOM is written with classes instead of interfaces, making
it more straightforward to use. With SAX you must first implement
interfaces before you can get it to work. This work is eased somewhat by
helper classes like DefaultHandler; but overall, interfaces
make programming in SAX somewhat more complex, even though they also make
SAX uniform and flexible. XOM's classes provide some flexibility by
offering a number of check methods that may be overridden in
subclasses.
XOM does not stand by itself. It depends on an underlying SAX parser, such as a recent version of Xerces, to handle well-formedness checking and validation. XOM provides a simple interface to a parser, in effect hiding code without much of a performance hit.
I like XOM for the same reasons I like RELAX NG: you can pick it up in a snap if you already have a reasonable familiarity with Java idioms. And, like RELAX NG, the more I use XOM, the more I like it. It is well considered and doesn't try to do everything or please everybody. For more information on XOM's relationship to other XML APIs, you can read a presentation that Elliotte gave at the New York XML SIG meeting on 17 September 2002.
On the other hand, if you are underwhelmed by XOM's simplicity, you can go back to your favorite old API or mix APIs, taking what you like from each. But if simplicity, openness, and ready availability are keys to the wide adoption of software, XOM has little problem measuring up to that standard.
Bear in mind that XOM is still a work in progress. This article only walks through part of the interface, but it should give you enough example code to get you well on your way.
The sample programs and documents discussed in this article are
available for download in ZIP
archive form. And you can read the Javadocs for nu.xom.*
online.
To run the examples, your system must have:
- Java version 1.2 or later. I have tested the examples with Java version 1.4 in a Windows 2000 environment.
- Xerces version 2.1 or later. I have tested appropriate examples with Xerces 2.2.
- The latest XOM JAR file. The latest version at this writing is xom-1.0d8.jar.
Parsing a Document with XOM
Create a working directory and unzip the
program archive there. Copy the
Xerces and XOM JAR files there, too. The program Wf.java
checks a document for XML 1.0 well-formedness:
import java.io.IOException;
import nu.xom.Builder;
import nu.xom.Document;
import nu.xom.ParseException;
public class Wf {
public static void main(String[] args)
throws IOException, ParseException {
Builder builder = new Builder();
Document doc = builder.build(args[0]);
System.out.println(doc.toXML());
}
}
To compile the program, type the command:
javac -classpath xom.jar Wf.java
Use colons to separate the JAR files if you are working on a UNIX platform. The command line explicitly places the Xerces and XOM JARs on the classpath, making it evident what is going on. I've renamed the latest XOM JAR, from "xom-1.0d8.jar" to "xom.jar" for simplicity. After you successfully compile the program, you can run it by typing
java -cp .;xercesImpl.jar;xom.jar Wf file:///wrk/inst.xml
The fully qualified file path for the argument to Wf may
work more reliably than the filename alone, depending on your platform. If
the program runs successfully and inst.xml proves to be
well-formed (it should), the program will echo the input and add an XML
declaration:
<?xml version="1.0"?>
<instant>
<date month="December" day="1" year="2002"/>
<time hour="10" minute="17" second="33" zone="PST"/>
</instant>
Wf.java imports three XOM classes:
nu.xom.Builder, nu.xom.Document, and
nu.xom.ParseException. Builder creates a
document object by reading an XML document. It can pick up the document
from a file (as shown), a URL, or an input stream. Builder's
build() method actually reads the
document. Document represents the document, including its
document element and prolog. XML output is delivered by
Document's toXML() method, with help from
System.out.println(). The entire document is echoed using
this mechanism, with an XML declaration thrown in as part of the
parcel.
The IOException and ParseException classes
are checked and therefore required. They are declared the easy way in
Wf.java, that is, with a throws
keyword. Wf2.java uses a try/catch statement
instead.
Validating a Document with XOM
With just a few changes, you can add validation support. In the
following program (Val.java), notice three additions
highlighted in bold.
import java.io.IOException;
import nu.xom.Builder;
import nu.xom.Document;
import nu.xom.ParseException;
import nu.xom.ValidityException;
public class Val {
public static void main(String[] args)
throws IOException, ParseException, ValidityException {
Builder builder = new Builder(true);
Document doc = builder.build(args[0]);
System.out.println(doc.toXML());
}
}
When you add true as an argument to the
Builder constructor, you create a document object that is set
for validation. When this is the case, you also need to check for validity
exceptions by importing nu.xom.ValidityException and
declaring it on main() or in a try/catch
statement (see Val2.java).
Compile this program and then run it against instant.xml
with this command:
java -cp .;xercesImpl.jar;xom.jar Val file:///wrk/instant.xml
The document is validated against the DTD asserted in the document type
declaration, intant.dtd:
<!ELEMENT instant (date, time)>
<!ELEMENT date EMPTY>
<!ATTLIST date month NMTOKEN #REQUIRED
day NMTOKEN #REQUIRED
year NMTOKEN #REQUIRED>
<!ELEMENT time EMPTY>
<!ATTLIST time hour NMTOKEN #REQUIRED
minute NMTOKEN #REQUIRED
second NMTOKEN #REQUIRED
zone NMTOKEN #REQUIRED>
When running Val.class, success is indicated when the
program echoes its input:
<?xml version="1.0"?>
<!DOCTYPE instant SYSTEM "instant.dtd">
<instant>
<date month="December" day="1" year="2002"/>
<time hour="10" minute="17" second="33" zone="PST"/>
</instant>
Adding Elements and Attributes
Suppose you picked up a copy of inst.xml and you wanted to
add an element with an attribute to it? The program
AddUtc.java does just that (note changes in bold):
import java.io.IOException;
import nu.xom.Attribute;
import nu.xom.Builder;
import nu.xom.Document;
import nu.xom.Element;
import nu.xom.ParseException;
public class AddUtc {
public static void main(String[] args)
throws IOException, ParseException {
Builder builder = new Builder();
Document doc = builder.build("inst.xml");
Element root = doc.getRootElement();
Element utc = new Element("utc");
Attribute att = new Attribute("offset", "-08:00");
utc.addAttribute(att);
root.insertChild(0, "\n ");
root.insertChild(1, utc);
root.removeChild(4);
root.removeChild(4);
System.out.println(doc.toXML());
}
}
This program imports the Element and
Attribute classes from the nu.xom
package. Instead of using a command line argument to pick up a file to
parse, it is hardcoded to grab inst.xml. It uses the
getRootElement() method from the Document class
to determine the document element of inst.xml.
A utc element is created along with an offset
attribute using the Attribute class. The
addAttribute() method from Element adds this
attribute to the utc element. Calling
insertChild() inserts a text child at position 0, immediately
after the root element time. Following that,
insertChild() places the utc element at position
1.
The code also removes the time element (and preceding
whitespace) by using the removeChild() method twice with the
same argument value. (The argument represents a node position.) After XOM
removes the first node (two contiguous whitespace characters), the
following node (the time element) moves up in the tree to the
position previously occupied by the whitespace.
The result looks like this (utc.xml):
<?xml version="1.0"?>
<instant>
<utc offset="-08:00" />
<date month="December" day="1" year="2002" />
</instant>
Pages: 1, 2 |