Menu

Comparing Java Data Binding Tools

September 3, 2003

Mette Hedin

Many W3C XML Schema (WXS) data binding tools for Java are now emerging. These tools generate Java code from instances of WXS in order to represent the structures defined therein. The autogenerated code has the ability to convert from XML format to Java objects and vice versa. This gives the user a compile-time Java API customized for the specific schema used, which saves a lot of time and effort compared to utilizing generic interfaces such as DOM and JDOM. In addition it also enables Java developers with little or no XML knowledge to both consume and produce valid XML documents. This type of tool can provide very great benefits for many development efforts.

There are currently a number of interesting tools available that have this functionality. The problem is that with so many choices, and the technology being relatively new, it's hard for a prospective user to choose the most beneficial tool. The functionality and level of adherence to the WXS standard becomes very important in considering which tool to choose. What use is saving resources in the development of an application if the chosen tool ultimately is unable to support the features used in the schemas or if it cannot produce valid output? In that case the handy tool has become an unexpected bottleneck that can be hard to pass.

This is why a comparison between some of the currently available tools is vital. It's often hard to determine from the outside looking in which features each tool supports, how stable it is, and which WXS features it supports. This article is a first attempt at such a comparison between some of the first available tools. As a part of the comparison, a standard set of test cases has been created to test the conformance and functionality of any WXS data binding tool.

The Tools

The tools that have been tested in this first comparison are

  • Breeze XML Binder, a tool produced by Breeze Factor
  • Castor, an open source project under ExoLab
  • JAXB Reference Implementation, created by SUN
  • XGen, a tool produced by Commerce One.

All tools were tested with their latest version at the time of this article.

Note that I designed and developed the XGen tool.

Breeze XML Binder

Breeze is a commercial tool produced by Breeze Factor. It's available as an evaluation copy from http://www.breezefactor.com/. However, in order to use the software and the generated code, a license fee must be paid. The version tested in this comparison is 3.0.

Castor

Castor is an open source project tool under ExoLab and is available free of charge in binary or source form at http://www.castor.org. The version tested in this comparison is 0.9.5. The code generation tool has a command line interface and a programmatic interface.

JAXB Reference Implementation

JAXB is a standard mapping developed by Sun in cooperation with a number of partners. JAXB in itself only specifies the intended behavior of a data binding tool and is not a tool in itself. However, the mapping is accompanied by a reference implementation. The JAXB jars and the reference implementation are both part of the Sun Java Web Services Developer Pack 1.2, which is available for free at http://java.sun.com/xml/jaxb/.

XGen

XGen is a open source tool produced by Commerce One. The tool has been developed with the Castor code as a basis, but with a substantial number of changes in the mapping and the functionality. The tool is included in the Commerce One Conductor DocSOAP XML Developer's Kit available free of charge at http://www.commerceone.com/developers/docsoapxdk. The version tested in this comparison is 6.0.

The Comparison Test

The testing consists of the code generation test and the runtime test using a set of test cases developed specifically for this comparison.

The Test Case Suite

The test case suite is available at http://www.commerceone.com/developers/docs/testsuite.zip. This first version of the test case suite is relatively basic, consisting of 116 schemas testing various features of WXS and 111 instances of the schemas.

The test cases in the test suite have been chosen to test a number of WXS features. The features selected, and the test cases created, have not been chosen with any particular tool in mind. The features that have been selected for inclusion in the test suite are features that have been determined to be basic or to be frequently used by users of the language. The test cases are therefore intended to test the over-all basic feature coverage of the various tools. The test cases have not been designed to test the more obscure features of the schema language, or to construct the most backbreaking feature combinations, as this is not the most common use of the language.

The Code Generation Test

The code-generation test exercises two main aspects of the code generation:

  • The XML Schema feature support. Any tool may have weaknesses in that features may be unsupported, or poorly supported. The schema is run through the code generation tool, to ensure no errors occur.

  • Flaws in the generated Java Code. Once the code has been generated, it is compiled in a java compiler. Code must compile to be usable.

The Runtime Test

The runtime tests consists of a "roundtrip", that is, converting the XML document to populated Java objects, converting those Java objects back to an XML instance. It also includes comparing the input and output. This is a good test for two important aspects of the ultimate functionality of the generated code:

How well the conversion from XML to Java works. The XML instance is converted to instances of the corresponding Java classes. This tests that the data and structures in the document can be successfully transferred to the Java object.

How well the conversion from Java to XML works. The generated code must be able convert itself to XML, and this XML document must be valid according to the corresponding schema. In addition no loss of data may occur in the conversion.

In addition, the comparison between the input and output checks that no significant data or structure loss occurred in either step. Significant differences are defined as changes in the data or differences in the structure of the document. If the document has been modified in any significant way, this is a serious flaw in an XML data binding tool. This could mean that an application does not get all the data or gets incorrect data, performing badly or incorrectly as a consequence. Another possible outcome is that the output XML document is invalid, which may cause a failure at some later point.

The test does not test the user-friendliness and ease of use of the generated APIs. It may be that a tool that performs well in a comparison is hard to use. This is beyond the scope of the test suite, as it entails a more detailed analysis. One good example of this is JAXB which, at any slight complication of a content model, defaults to a poorly typed List interface. This means that a large number of content models are supported and roundtrips well, but makes the code very difficult to use, since the user must know the appropriate content model, defeating the purpose of a generated type-safe API to a large degree.

It should also be noted that this comparison only compares the basic functionality, without any customization of the generated code. Some tools allow customization of the generation code, which in can in some cases fix mapping problems. Customization can end up being a hidden cost for generated code that is used by more than a handful of developers. Any customization has to be communicated to account for unexpected APIs. It also adds development time for each schema that requires customization and introduces added risk. In addition, as it is likely that many users would assume the code to work without it, customization has not been utilized for this comparison.

Test Suite Performance Comparison

The following graph shows the results of the runtime test for each of the tools. As the runtime test also reflects any failures in the code generation test, the runtime test result is what best reflects the total feature support performance for each tool.

Over-all test results for all four tools.
Over-all test results for all four tools.

The test results for each tool in the various test categories breaks down as follows:

Test results broken down by test case category.
Test results broken down by test case category

Feature Comparison

When comparing tools, it is also important to take an over-all view of some of the behaviors and features of the tools. Many of the tools have differences in mapping philosophies as well as other features, which may influence the choice of tool.

Comparison Overview

Feature

Breeze

Castor

JAXB RI

XGen

Customization Disallowed

no

no

no

yes

Package Name Mapped from Namespace

no

no

yes1

yes

Automatic Unmarshaling

no

no

no

yes

Open Source

no

yes

no2

yes

Free of Charge

no

yes

yes

yes

Schema Location Roundtrippable

no

no

no

yes

Schema Location Settable

no

yes3

yes3

yes

Schemas without a Target Namespace Supported

yes

yes

yes

no

Instance Validation on Unmarshal

no

yes

yes4

yes

Constraint Check of Values on Set

5

no

no

yes

Value Validation when Marshaling

no

yes

no

5

Code Generation Command Line Tool Interface

yes

yes

yes

yes

Code Generation Programmatic Interface

no

yes

no

yes

Code Generation GUI Interface

yes

no

no

no

Generated Code Implements Generic Interfaces

yes

no

yes6

yes

1 -- Mapped if no other package name is specified.
2-- JAXB will be available as an open source project in the upcoming future. More information can be found at http://jaxb.dev.java.net/.
3 -- Only on marshaller class.
4 -- Off by default.
5 -- Some values validated.
6 -- Interfaces based on functionality instead of type of XML Schema construct.

Feature Explanation

Customization Disallowed
In large systems where many developers use the generated code, it is desirable to not allow any customization of the generated code. This shortens the development time and makes the use of the code safer for several reasons. No time needs to be spent on customizing. The generated APIs are reliable. It's not possible to accidentally change the APIs by using the wrong binding file or options when regenerating code. Hence while customization may initially sound like a good idea, it can often introduce more problems than it solves. Some tools in fact require customization to compensate for mapping flaws and require the use of a binding file if certain unsupported features have been used.

Package Name Mapped from Namespace
Many tools allow or require one configuration, which is the package name of the generated code. With a configurable package name, it is more likely to have duplication of classes if several users generate code from the same schema. In addition, without a class name, predictable from the namespace, it becomes impossible to automatically locate the relevant classes, thus disabling dynamic handling of documents. A mapped package name also enables better support for polymorphic document content.

Automatic Unmarshaling
Many tools require that the user know the root element of a document in advance and at runtime the user must either specify a package name or instantiate certain classes before a document can be converted from XML to Java objects. This means the user must know what document is to be unmarshaled, in order for the unmarshaling to take place. By contrast, a tool that can handle automatic unmarshaling does not require the user to know which kind of document is unmarshaled ahead of time. This in turn also enables dynamic loading and processing of documents. It also enables better support for polymorphic document content.

Open Source

Open source allows users to view, modify, and distribute the software.

Free of Charge

Immediate cost saving.

Schema Location Roundtrippable
Many tools drop the schema location attribute when converting from XML to Java. For tools not utilizing validation, this may not seem to be such a big problem, as the schema will not be loaded at unmarshaling. If validation is used, the tool may be unable to load the instance it itself produced. A greater issue is that other applications that consume the created instances may be unable to process the instance due to this, unless a custom entity resolution scheme has been set up, which can be a prohibitive cost for simpler applications. Some tools do add the ability to set the schema location manually, although this requires that the schema location is extracted by the user before the instance is unmarshaled and that is maintained elsewhere until the instance is again marshaled. However, a schema location can be very important information, if provided, and should not be dropped.

Schema Location Settable
In addition to unmarshaling a schema location it may be desirable to be able to set one. A consuming application may require a schema location to be present.

Schemas without a Target Namespace Supported
Some tools do not support schemas without a target namespace. If the package name is required to be generated from the namespace or the unmarshaling is automatic, then non-namespace schemas will likely not be supported.

Instance Validation
One of the major benefits of having a schema is that the parser does a lot of the data checking automatically if validation is turned on. To a large degree this alleviates the user from having to check the data. If the tool does not use validation, it must either duplicate parser validity checking, or it must assume that the instance is valid. If the instance is not valid, unexpected errors may occur, or the resulting objects may contain an invalid document.

Constraint Check of Values on Set
Many tools do not check the data values when set. This means that invalid values may be set, thus invalidating the entire instance without the user knowing about it. Some tools do perform validation when marshaling the instance, but that may not be until much later, when a lot of unnecessary processing has been performed. It is better to have the error reported when it happens. Beware of tools that do not validate values either on set or marshal. They do not automatically guarantee any sort of validity of the produced instances and requires the user to validate the populated java object.

Value Validation when Marshalling
Some tools automatically validate values on marshal. This means the values are checked for validity against the constraints when output. Beware of tools that do not validate values either on set or marshal. This deficiency can be remedied by an application manually calling the validation API available in the generated code, but the user needs to be aware that this does not occur automatically.

Code Generation Command Line Tool Interface
Most tools have a command line interface. This means that the code can be generated on the command line.

Code Generation Programmatic Interface
Some tools have a programmatic interface. This means that the code can be generated dynamically at runtime, and also that it can be included in other tools. This is desirable for some users.

Code Generation GUI Interface
A few tools, most commonly the commercial tools, have a GUI interface. This is nice, but if a tool has only a graphic interface, it is hard to do any sort of automated or dynamic code generation, which may be restricting for some users.

Generated code Implements Generic Interfaces
Some tools generate classes to extend generic interfaces, based on the type of the structure resulting in a class. This is useful if any dynamic processing of the classes is desired place. For example, all global element declarations may have certain functionality in common, and so it may be useful if all classes generated from global elements implements a generic global element interface. This enables dynamic detection that an object represents a global element, which in turn enables dynamic interaction with the object. It should be noted that certain tools allow the user to specify a class that all generated classes should extend. This is not the same thing, as in this case all classes extend the same base class, and no difference can be made between classes generated from different types of objects based on the super class. In addition, classes generated from different types of structures have different functionality and different methods, and so it is hard to write one generic super class that is useful for all the generated classes.

Conclusion

WXS is a very complex specification, and so it is not surprising that all of the tested tools have some flaws in one area of or another of functionality. However, this is likely due these tools belonging to a new and still developing generation of XML tools, as compared to older, more tested tools such as DOM and SAX.

However, these new XML data binding tools provide much more advanced functionality than their predecessors, and giant strides are being made in increasing stability and functionality. All the data binding tools tested here have the potential to offer substantial resource savings for a great number of diverse projects. When choosing a data binding tool to suit your needs, a surface comparison may help whittle the choice down to a smaller subset of candidates. Hopefully the test cases and initial comparison provided here can help accomplish that.

Resources

XML Data Binding for Java Test Suite

XML Schema Specification, part 0: Primer

XML Schema Specification, part 1: Structures

XML Schema Specification, part 2: Datatypes

Breeze Factor

Castor

JAXB

XGen

JAXB open source information