Menu

The JAXB API

January 8, 2003

Kohsuke Kawaguchi

Introduction

Sun has recently released version 0.75 of the Java Architecture for XML Binding (JAXB), as well as its reference implementation.

JAXB consists of two parts. First, JAXB contains a compiler that reads a schema and produces the equivalent Java object model. This generated object model captures the structure of XML better than general-purpose APIs like DOM or SAX, making it a lot easier to manipulate XML content.

The second part is an API, through which applications communicate with generated code. This API hides provider-specific implementation code from applications and also provides a uniform way to do basic operations, such as marshalling or unmarshalling.

JAXB Diagram

The compiler has caused some concern (see JAXB or W3C XML Schema "a la carte"? and Cafe con Leche) as to how does or doesn't handle W3C XML Schema, but the API hasn't attracted much attention. Thus, in this article, I'll iintroduce the JAXB API.

We'll examine the design of the API and discuss its shortcomings; consider JAXB in the context of the Java-XML universe; and, finally, we'll learn if JAXB can evolve to meet expected future needs.

Why JAXB API?

While you can use JAXB without knowing how a schema is mapped to Java or how to use customizations, you cannot use JAXB if you don't know the API. Further, the API is here to stay. The way a JAXB binding compiler maps W3C XML Schema to Java can be changed more drastically and easily by utilizing the version attribute and through the extensibility framework.

Thanks to this extensibility framework, vendors are free to go beyond the baseline functionality in the specification. Similarly, the JAXB specification can extend the supported subset of W3C XML Schema in the future or even support other schema languages. But to do this, the API needs to be sufficiently solid and flexible now. For these reasons, the API deserves thorough consideration.

API Overview

The JAXB API, defined in the javax.xml.bind package, is a set of interfaces through which client applications communicate with code generated from a schema. The center of the JAXB API is JAXBContext, the client's entry point. It provides an abstraction for managing the XML-Java binding information necessary to implement the JAXB binding framework operations: unmarshal, marshal and validate.

These three aspects of JAXB are covered by three separate interfaces. Instances of those interfaces can be created from a JAXBContext object:

  • Unmarshaller: governs the process of deserializing XML data into Java content trees, optionally validating the XML data as it is unmarshalled;
  • Marshaller: governs the process of serializing Java content trees back into XML data;
  • Validator: performs the validation on an in-memory object graph.

JAXBContext is an abstract class defined in the API, so its actual implementation is vendor-dependent. To create a new instance of JAXBContext, you use the static newInstance method. This method takes a list of package names as a parameter. Each schema is compiled into a single package, which means that you can assemble them at run-time by providing multiple package names.

JAXBContext context = JAXBContext.newInstance("org.acme.foo:org.acme.bar");

In this way, the unmarshaller will look at a document and figure out which package to use. This makes it easy to read in different types of documents without knowing their type in advance.

Unmarshalling

An unmarshaller is used to read XML and build an object tree from classes generated by the compiler. To read an XML file, you would simply do

    Unmarshaller unmarshaller = context.createUnmarshaller();

    MyObject o = (MyObject)unmarshaller.unmarshal(new File("foo.xml"));

There are other overloaded versions that take different types of input, such as InputStream or InputSource. You can even unmarshal a javax.xml.transform.Source object. All in all, it's similar to the way DOM trees are parsed.

In the previous version of JAXB, this functionality was provided as a method on the generated class. Since one cannot add methods to existing classes, this design made it impossible to unmarshal them. By moving it into a separate Unmarshaller interface, the new API makes it possible to support this in a future version.

JAXB also supports unmarshalling via a SAX ContentHandler. You can send SAX events to the unmarshaller and have it unmarshal objects. This enhances the connectivity of the Unmarshaller considerably. For example, you can parse the header of a message by using JAXB and send the body of the message to another component. With ContentHandler support, this can be done efficiently.

By default, Unmarshaller is very forgiving. Even if a document is invalid, it tries to recover from errors. If the document is so broken that it cannot be read, an UnmarshalException will be thrown.

It's often desirable to get more information about errors or reject documents with errors. The first step to do this is to set ValidationEventHandler to the Unmarshaller. A ValidationEventHandler can explicitly tell a JAXB implementation whether it should reject a document or try to recover from errors. It also gives you more information, such as line numbers, about errors.

An Unmarshaller can validate a document with the schema while unmarshalling. With this option turned on, it rejects anything short of a valid document. However, W3C XML Schema validation can be very costly.

Another possibility is to set up a SAX pipeline in such a way that your XML parser does the validation; alternately, you could install a stand-alone validator in the pipeline (such as JARV: validation API). In this way, for example, you can change your schema to change what you get from the compiler, while maintaining the scrutiny of the original schema.

Marshalling

A Marshaller is used to write an object graph into XML. To write an object o to a file, you would do

    Marshaller marshaller = context.createMarshaller();

    marshaller.marshal( o, new FileOutputStream("foo.xml") );

There are other overloaded versions which allow you to produce XML as a a DOM tree or as SAX events. For example, by using StringWriter, you can marshal an object into a string. You can also marshal an object graph to a javax.xml.transform.Result object.

In the previous version of JAXB, this functionality was provided as a method on the generated class, and the number of formats was limited. Making it a separate interface enables mapping of existing classes to XML in the future.

In the previous version of JAXB, there was no provision for controlling the formatting. In the new API, you can control the behavior of marshalling by setting Marshaller properties. For example, you can toggle indentation of the XML. More importantly, the mechanism is extensible, which means JAXB providers can expose advanced features.

Although you can customize the behavior of marshalling to some degree by using this mechanism, there's a good chance it doesn't fill your needs completely. For example, some people might want to use tabs for indentation and others might prefer spaces. In some schemas, certain elements cannot be indented without changing the meaning. You might want to control the namespace prefixes. Or you might want to add a processing instruction to the document. In addition, relying on vendor-specific properties compromises portability.

The new version of JAXB can produce XML as SAX events. That is, you can pass ContentHandler and have it receive SAX events from a JAXB object. This gives client apps plenty of chances to modify XML. For example, you can add and remove elements or attributes, use one of the freely available serializers ( XMLWriter by David Megginson, org.apache.xml.serialize.XMLSerializer) for better output, or write your own XML serializer that prints XML in your preferred way.

Finally, you can ask a Marshaller to marshal an invalid object graph by setting a ValidationEventHandler. If a provider supports error recovery, you can tell it to write XML even if it's incomplete.

Validation

JAXB also has the capability to validate an object graph in memory without actually writing it to XML. This allows client apps to check if a graph is okay and ready to process; if not, validation will identify objects that contain errors so that, for example, client apps can ask users to fix those.

The following code validates the object "o".

    Validator v = context.createValidator();

    if(!v.validate(o))    System.err.println("error");

To receive detailed information about errors, you need to register ValidationEventHandler with the Validator, just like you did in Unmarshaller and Marshaller. This is analogous to registering an ErrorHandler for a SAX parser.

You can also first marshal an object graph and then validate XML (for example by Java API for validators). But doing so makes it much harder to associate errors with their sources, which makes debugging harder for humans. Validation after marshalling will give you errors like "missing <foo> element," but you can hardly know what is actually wrong in the object graph.

Validity is not enforced while you are modifying an object graph; you always have to explicitly validate it. To edit a valid object graph into another valid object graph, you may need to go through invalid intermediate states. If validity is enforced on every step of mutation, this becomes impossible.

Combining JAXB with other XML technologies

JAXB, as I've described, can be combined with any XML technology that can produce or consume SAX events. This enables JAXB to talk to XSLT, DOM, dom4j, XML-aware database, and many existing libraries. Also, you can easily plug XMLFilters into this process. XMLFilters are particularly useful for making a small modification efficiently (see Tip: Use a SAX filter to manipulate data, The Collected Works of SAX).

Working with SAX can require advanced knowledge of XML and can also be messy. As a cleaner, high-level abstraction, JAXB also supports javax.xml.transform.Source/Result classes. Just like XSLT can read from any Source object and write to any Result object, JAXB can unmarshal from any Source object and marshal to any Result object. If another XML technology supports them (like dom4j, JDOM, and XSLT), this makes it a snap to combine them with JAXB.

JAXB has two utility classes: the first, javax.xml.bind.util.JAXBSource, presents a JAXB object graph as an XML source; the second, JAXBResult, allows the results of processing to be retrieved directly as a JAXB object. While JAXB is primarily intended for XML beginners, these features make JAXB attractive even for seasoned developers.

Shortcomings of API

JAXB has some shortcomings. Client apps can't customize the marshalling behavior very much. The only standard options are (1) setting the encoding to be used, (2) turn on/off indentation, and (3) add xsi:schemaLocation/xsi:noNamespaceSchemaLocation. You can't even specify how a document should be indented. While you can always plug in some other XML writer, it's not as easy as the rest of JAXB. It would have been easier if Java had a reusable XML writer, like the one in Microsoft .NET. Then the JAXB API could have been designed to work with it.

Further, the ValidationEventHandler interface handles errors during unmarshalling, validation, and marshalling, and ValidationEvent objects represent error information. The problem is that you can't throw ValidationEvent, because it's not throwable. If you compare this with SAX, where you often throw SAXParseException from your ErrorHandler, this is a big difference. I don't claim that the SAX's way is better, but you have to learn yet another way of handling errors.

Future Needs

One of the concerns raised in the community is that JAXB can't be used to bind existing code to XML, since it can only produce Java classes from a schema, not the other way around. While this is indeed true for the current version of JAXB, we have seen that the API itself doesn't have this constraint. Nothing prevents tools from using this API to map existing Java classes to XML. In fact, the interface used by Castor to do this Java-centric mapping is very close to JAXB API, so it won't be hard to change it to use this API.

Similarly, the current version of JAXB only deals with W3C XML Schema. But, again, the API has virtually no dependency on any particular schema language. Therefore, a tool can hijack the JAXB API for its choice of schema language. It would be good to see a RELAX NG-aware databinding tool (like Relaxer) use the JAXB API.

Resources

Sun's JAXB page contains pointers to most of the resources. If you are interested in JAXB's API, the online javadoc has detailed descriptions about how it works. If you are interested in using JAXB, have a look at the online user's guide and download the reference implementation. Also, the specification is available to everyone.

Many enhancements in the current release of the JAXB technology are owed to feedback from our user community. I encourage you to join the jaxb-interest mailing list where users discuss issues, desired features, and exchange solutions on how to get the most from JAXB.