XML Data-Binding: Comparing Castor to .NET

July 24, 2002

After the second article in this series was published, several readers said that they would like to learn the .NET way to map data from XML to a relational database management system. I'd like to show you that, but first I've got to lay some groundwork. In this article, I will show how .NET XML data binding works, while investigating the equivalent Java functionality. Java and .NET both have excellent support for data binding, and although they work in slightly different ways, each is just as valid and useful as the other. In my next article, I'll complete the exercise by mapping XML files to an RDBMS.

To begin by sounding my usual theme, there are several ways to do it in Java. In fact, Brett McLaughlin has written an entire book about several of the many ways to do XML databinding. Appropriately titled "Java & XML Databinding", its main topic is mapping XML to Java objects with the Java™ Architecture for XML Binding (JAXB). It also includes a chapter on Castor, an open source data binding framework, which uses the Java Data Objects (JDO) specification for database mapping, as well as providing XML data binding through a non-JAXB interface. Castor effectively serves as a bridge between XML and a database, with Java objects as an intermediary layer.

Go, Bind Thou Up Young Dangling Apricots

The first side of Castor is its interface for binding XML documents to Java objects. There's not enough room here for us to go into all the details, but we will go through the following two steps.

Define the mappings from XML to Java, either procedurally or through configuration files (which may themselves may either be W3C XML Schema documents or in Castor's own format).
Either generate Marshaller and Unmarshaller classes specific to our data (for best performance) or allow Castor to manage the marshaling and unmarshaling at runtime (for most flexibility).

There are many options to customize your Castor project; for the most balanced comparison to .NET, we'll stick to W3C XML Schema and Castor's built-in runtime marshaling. Unlike many of the other Java databinding frameworks, Castor includes excellent W3C XML Schema support.

Our example in this article will be a dog show. The dog show's domain objects are the show itself, the dogs, dog breeds, judges, and show rings. Each breed is evaluated by a particular judge in a particular show ring at a particular time, and each breed contains dogs to be judged, so we've got a nice model with just enough objects and relationships to be interesting.

We'll be using W3C XML Schema to define our marshaling, so let's get right down to the model, presented here in W3C XML Schema. It's called DogShow.xsd.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema elementFormDefault="qualified" 
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:complexType name="ShowType">
    <xs:sequence>
      <xs:element minOccurs="1" maxOccurs="1" name="id" 
        type="xs:long" />
      <xs:element minOccurs="1" maxOccurs="1" name="name" 
        type="xs:string" />
      <xs:element minOccurs="1" maxOccurs="unbounded" 
        name="judging" type="JudgingType" />
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="BreedType">
    <xs:sequence>
      <xs:element minOccurs="1" maxOccurs="1" name="id" 
        type="xs:long" />
      <xs:element minOccurs="1" maxOccurs="1" name="name" 
        type="xs:string" />
      <xs:element minOccurs="1" maxOccurs="unbounded" 
        name="dog" type="DogType" />
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="DogType">
    <xs:sequence>
      <xs:element minOccurs="1" maxOccurs="1" name="id" 
        type="xs:long" />
      <xs:element minOccurs="1" maxOccurs="1" name="name" 
        type="xs:string" />
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="JudgeType">
    <xs:sequence>
      <xs:element minOccurs="1" maxOccurs="1" name="id" 
        type="xs:long" />
      <xs:element minOccurs="1" maxOccurs="1" 
        name="firstName" type="xs:string" />
      <xs:element minOccurs="1" maxOccurs="1" 
        name="lastName" type="xs:string" />
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="ShowRingType">
    <xs:sequence>
      <xs:element minOccurs="1" maxOccurs="1" name="id" 
        type="xs:long" />
      <xs:element minOccurs="1" maxOccurs="1" name="name" 
        type="xs:string" />
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="JudgingType">
    <xs:sequence>
      <xs:element minOccurs="1" maxOccurs="1" 
        name="breed" type="BreedType" />
      <xs:element minOccurs="1" maxOccurs="1" 
        name="judge" type="JudgeType" />
      <xs:element minOccurs="1" maxOccurs="1" 
        name="showRing" type="ShowRingType" />
      <xs:element minOccurs="1" maxOccurs="1" 
        name="dateTime" type="xs:dateTime" />
    </xs:sequence>
  </xs:complexType>

  <xs:element name="Show" nillable="false" 
        type="ShowType" />
  <xs:element name="Breed" nillable="true" 
        type="BreedType" />
  <xs:element name="Dog" nillable="true" 
        type="DogType" />
  <xs:element name="Judge" nillable="true" 
        type="JudgeType" />
  <xs:element name="ShowRing" nillable="true" 
        type="ShowRingType" />
  <xs:element name="Judging" nillable="true" 
        type="JudgingType" />

</xs:schema>

Given this schema, we can use Castor's source generator to build Java source with the following command (you will need to have downloaded the Castor jar file, as well as having an XML parser in your CLASSPATH; I'm using Xerces, from the Apache XML project):

java org.exolab.castor.builder.SourceGenerator -i DogShow.xsd -package
                        org.dogshow

(Answer "A" to replace any files that get created during the code generation process.)

And we can compile the generated sources with this command line:

javac org\dogshow\*.java

You'll see that Castor has generated two Java source files for each element in the schema. For example, for the dog element, Castor has generated Dog.java and DogDescriptor.java. But because the schema also specified some types, they have also been generated, DogType.java and DogTypeDescriptor.java.

Dog is the class we will use to manipulate the actual Dog object. DogDescriptor, DogType, and DogTypeDescriptor will be used internally by Castor to marshal and unmarshal our objects to and from XML. While Dog proxies the marshaling methods, it extends DogType, which contains the actual variables and methods we're interested in.

Here's the generated code for DogType.java, which shows how Castor structures the generated business classes.

/*
 * This class was automatically generated with 
 * <a href="http://castor.exolab.org">Castor 0.9.3.9+</a>, using an
 * XML Schema.
 * $Id$
 */

package org.dogshow;

  //---------------------------------/
 //- Imported classes and packages -/
//---------------------------------/

import java.io.IOException;
import java.io.Reader;
import java.io.Serializable;
import java.io.Writer;
import org.exolab.castor.xml.*;
import org.exolab.castor.xml.MarshalException;
import org.exolab.castor.xml.ValidationException;
import org.xml.sax.ContentHandler;

/**
 * 
 * 
 * @version $Revision$ $Date$
**/
public abstract class DogType implements java.io.Serializable {


      //--------------------------/
     //- Class/Member Variables -/
    //--------------------------/

    private long _id;

    /**
     * keeps track of state for field: _id
    **/
    private boolean _has_id;

    private java.lang.String _name;


      //----------------/
     //- Constructors -/
    //----------------/

    public DogType() {
        super();
    } //-- org.dogshow.DogType()


      //-----------/
     //- Methods -/
    //-----------/

    /**
     * Returns the value of field 'id'.
     * 
     * @return the value of field 'id'.
    **/
    public long getId()
    {
        return this._id;
    } //-- long getId() 

    /**
     * Returns the value of field 'name'.
     * 
     * @return the value of field 'name'.
    **/
    public java.lang.String getName()
    {
        return this._name;
    } //-- java.lang.String getName() 

    /**
    **/
    public boolean hasId()
    {
        return this._has_id;
    } //-- boolean hasId() 

    /**
    **/
    public boolean isValid()
    {
        try {
            validate();
        }
        catch (org.exolab.castor.xml.ValidationException vex) {
            return false;
        }
        return true;
    } //-- boolean isValid() 

    /**
     * 
     * 
     * @param out
    **/
    public abstract void marshal(java.io.Writer out)
        throws org.exolab.castor.xml.MarshalException, 
        org.exolab.castor.xml.ValidationException;

    /**
     * 
     * 
     * @param handler
    **/
    public abstract void marshal(org.xml.sax.ContentHandler handler)
        throws java.io.IOException, 
        org.exolab.castor.xml.MarshalException, 
        org.exolab.castor.xml.ValidationException;

    /**
     * Sets the value of field 'id'.
     * 
     * @param id the value of field 'id'.
    **/
    public void setId(long id)
    {
        this._id = id;
        this._has_id = true;
    } //-- void setId(long) 

    /**
     * Sets the value of field 'name'.
     * 
     * @param name the value of field 'name'.
    **/
    public void setName(java.lang.String name)
    {
        this._name = name;
    } //-- void setName(java.lang.String) 

    /**
    **/
    public void validate()
        throws org.exolab.castor.xml.ValidationException
    {
        org.exolab.castor.xml.Validator validator = 
            new org.exolab.castor.xml.Validator();
        validator.validate(this);
    } //-- void validate() 

}

As you can see, the generated code gives us a JavaBeans interface for all the instance variables, so getXXX() and setXXX() methods are there. It also has included a helper instance variable called _has_id to track whether the instance variable _id, which is a long, has a value. Castor has ways to generate the code using wrapper objects which would make this unnecessary; we won't go into how to do this, but if you're interested in learning more about Castor, look at Brett's book or Dion Almaer's OnJava.com article XML Data Binding with Castor.

Castor also generated isValid() and validate() methods, which use the Schema to validate any data when it is marshaled. And it has created marshal() and unmarshal() methods.

Now that we've generated the code, it's a simple matter to create data. Here's a short program which instantiates a few of the relevant objects and serializes them to an XML file.

package org.dogshow;

import java.io.FileWriter;
import java.util.Date;

public class MakeDogShow {
    public static void main(String [] args) {
	try {
	    Dog [] dog = new Dog [2];
	    dog[0] = new Dog();
	    dog[0].setId(1);
	    dog[0].setName("Wil-Orion's Angus Highlander");
	    
	    dog[1] = new Dog();
	    dog[1].setId(2);
	    dog[1].setName("LenLear's Webmaster");
	    
	    Breed breed = new Breed();
	    breed.setId(1);
	    breed.setName("English Springer Spaniel");
	    breed.setDog(dog);

	    Judge judge = new Judge();
	    judge.setId(1);
	    judge.setFirstName("John");
	    judge.setLastName("Smith");
	    
	    ShowRing showRing = new ShowRing();
	    showRing.setId(1);
	    showRing.setName("1");
	    
	    Judging [] judging = new Judging [] {new Judging()};
	    judging[0].setJudge(judge);
	    judging[0].setBreed(breed);
	    judging[0].setShowRing(showRing);
	    judging[0].setDateTime(new Date());
	    
	    Show show = new Show();
	    show.setId(1);
	    show.setName("O'Reilly Invitational Dog Show");
	    show.setJudging(judging);
	    
	    FileWriter writer = new FileWriter("show.xml");
	    show.marshal(writer);
	    writer.close();
	} catch (Exception e) {
	    e.printStackTrace(System.err);
	}
    }
}

We can compile all our code and run MakeDogShow to produce the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<Show><id>1</id><name>O'Reilly Invitational 
Dog Show</name><judging><breed><id>1</id>
<name>English Springer Spaniel</name><dog>
<id>1</id><name>Wil-Orion's Angus Highlander</name>
</dog><dog><id>2</id><name>LenLear's 
Webmaster</name></dog></breed><judge><id>1</id>
<firstName>John</firstName><lastName>Smith</lastName>
</judge><showRing><id>1</id><name>1</name>
</showRing><dateTime>2002-07-13T13:51:37.112-04:00
</dateTime></judging></Show>

They Thought of That

The .NET Framework SDK ships with a handy little tool called xsd, the W3C XML Schema Definition Tool, which does for .NET what Castor's SourceGenerator does for Java. The following command line generates the C# source code for our DogShow schema:

xsd /c /l:cs DogShow.xsd

And here's the single generated source file:

//---------------------------------------------------------
// <autogenerated>
//     This code was generated by a tool.
//     Runtime Version: 1.0.3705.209
//
//     Changes to this file may cause incorrect behavior and 
//     will be lost if the code is regenerated.
// </autogenerated>
//---------------------------------------------------------

// 
// This source code was auto-generated by xsd, Version=1.0.3705.209.
// 
using System.Xml.Serialization;


/// <remarks/>
[System.Xml.Serialization.XmlRootAttribute("Show", 
    Namespace="", IsNullable=false)]
public class ShowType {
    
    /// <remarks/>
    public long id;
    
    /// <remarks/>
    public string name;
    
    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("judging")]
    public JudgingType[] judging;
}

/// <remarks/>
[System.Xml.Serialization.XmlRootAttribute("Judging", 
    Namespace="", IsNullable=true)]
public class JudgingType {
    
    /// <remarks/>
    public BreedType breed;
    
    /// <remarks/>
    public JudgeType judge;
    
    /// <remarks/>
    public ShowRingType showRing;
    
    /// <remarks/>
    public System.DateTime dateTime;
}

/// <remarks/>
[System.Xml.Serialization.XmlRootAttribute("Breed", 
    Namespace="", IsNullable=true)]
public class BreedType {
    
    /// <remarks/>
    public long id;
    
    /// <remarks/>
    public string name;
    
    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("dog")]
    public DogType[] dog;
}

/// <remarks/>
[System.Xml.Serialization.XmlRootAttribute("Dog", 
    Namespace="", IsNullable=true)]
public class DogType {
    
    /// <remarks/>
    public long id;
    
    /// <remarks/>
    public string name;
}

/// <remarks/>
[System.Xml.Serialization.XmlRootAttribute("ShowRing", 
    Namespace="", IsNullable=true)]
public class ShowRingType {
    
    /// <remarks/>
    public long id;
    
    /// <remarks/>
    public string name;
}

/// <remarks/>
[System.Xml.Serialization.XmlRootAttribute("Judge", 
    Namespace="", IsNullable=true)]
public class JudgeType {
    
    /// <remarks/>
    public long id;
    
    /// <remarks/>
    public string firstName;
    
    /// <remarks/>
    public string lastName;
}

We should note that the generated code contains attributes which we haven't seen before in this series. These are not the same thing as XML attributes; in C#, attributes are special constructs that decorate sections of code, such as assemblies, modules, types, members, return values, and parameters. They define additional information about the section they're attached to. For example, this attribute is attached to the ShowType type:

[System.Xml.Serialization.XmlRootAttribute("Show", Namespace="",
                        IsNullable=false)]

XmlRootAttribute indicates that this type's element is called "Show", that it has no namespace, and that it may not be null (nillable, in W3C XML Schema).

You'll also notice that the generated code in C# is much smaller than that generated for Java. This is because the JavaBean accessor methods have no equivalent in the C# code, and the marshal and unmarshal methods are unnecessary in C#, as we shall see shortly.

Now that we've generated the source, and we understand it, we can start porting our client code. The code in MakeDogShow.cs will be very similar to MakeDogShow.java; besides our usual Java-to-C# porting issues, xsd does a couple of things in code generation we need to deal with.

First, as we noted earlier, xsd does not generate getXXX() and setXXX() methods. Instead, we'll have to change all those method calls to directly access the instance variables, whose names will match the names in the schema file exactly.

Second, while we dealt with Dog, Show, etc., directly in Java, the W3C XML Schema types are mapped to C# types with the same name; that is, DogType, ShowType, etc. So instead of instantiating a Dog, we'll instantiate a DogType.

Finally, the generated ShowType type does not have an Unmarshal() method. Instead, we'll use the XmlSerializer to marshal our objects to XML.

Here's our C# version of MakeDogShow:

using System;
using System.IO;
using System.Text;
using System.Xml.Serialization;

public class MakeDogShow {
    public static void Main(string [] args) {
	try {
	    DogType [] dog = new DogType [2];
	    dog[0] = new DogType();
	    dog[0].id = 1;
	    dog[0].name = "Wil-Orion's Angus Highlander";
	    
	    dog[1] = new DogType();
	    dog[1].id = 2;
	    dog[1].name = "LenLear's Webmaster";
	    
	    BreedType breed = new BreedType();
	    breed.id = 1;
	    breed.name = "English Springer Spaniel";
	    breed.dog = dog;

	    JudgeType judge = new JudgeType();
	    judge.id = 1;
	    judge.firstName = "John";
	    judge.lastName = "Smith";
	    
	    ShowRingType showRing = new ShowRingType();
	    showRing.id = 1;
	    showRing.name = "1";
	    
	    JudgingType [] judging = new JudgingType [] 
	        {new JudgingType()};
	    judging[0].judge = judge;
	    judging[0].breed = breed;
	    judging[0].showRing = showRing;
	    judging[0].dateTime = DateTime.Now;
	    
	    ShowType show = new ShowType();
	    show.id = 1;
	    show.name = "O'Reilly Invitational Dog Show";
	    show.judging = judging;

	    XmlSerializer serializer = new XmlSerializer(
	        show.GetType());	    
	    TextWriter writer = new StreamWriter("show.xml");
	    serializer.Serialize(writer,show);
	    writer.Close();

	} catch (Exception e) {
	    Console.Error.Write(e);
	}
    }
}

It generates the following XML file which, while superficially different from the Java version, is the same syntactically.

<?xml version="1.0" encoding="utf-8"?>
<Show xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <id>1</id>
  <name>O'Reilly Invitational Dog Show</name>
  <judging>
    <breed>
      <id>1</id>
      <name>English Springer Spaniel</name>
      <dog>
        <id>1</id>
        <name>Wil-Orion's Angus Highlander</name>
      </dog>
      <dog>
        <id>2</id>
        <name>LenLear's Webmaster</name>
      </dog>
    </breed>
    <judge>
      <id>1</id>
      <firstName>John</firstName>
      <lastName>Smith</lastName>
    </judge>
    <showRing>
      <id>1</id>
      <name>1</name>
    </showRing>
    <dateTime>2002-07-13T13:52:21.6667232-04:00</dateTime>
  </judging>
</Show>

There and Back Again

Now that we've generated XML files from both Java and C#, the neat trick will be to see if we can load the data from one into the other. In Java, that's done with the generated Show.unmarshal() method, as demonstrated below:

package org.dogshow;

import java.io.FileReader;

public class LoadDogShow {
    public static void main(String [] args) {
	try {
	    FileReader reader = new FileReader("show.xml");
	    Show show = Show.unmarshal(reader);
	    
	    System.out.println(show.getJudging()[0].
                getBreed().getDog()[0].getName());

	    reader.close();
	} catch (Exception e) {
	    e.printStackTrace(System.err);
	}
    }
}

And in C#, it's done with the XmlSerializer.Deserialize() method, as shown here:

using System;
using System.IO;
using System.Xml.Serialization;

public class LoadDogShow {
    public static void Main(string [] args) {
	try {
	    StreamReader reader = new StreamReader("show.xml");
	    XmlSerializer serializer = new XmlSerializer(
	        typeof(ShowType));
	    ShowType show = (ShowType)serializer.Deserialize(reader);

	    Console.WriteLine(show.judging[0].breed.dog[0].name);

	    reader.Close();
	} catch (Exception e) {
	    Console.Error.Write(e);
	}
    }
}

Summing Up

So, what have we learned this time? First, that, given an W3C XML Schema, we can easily create classes that create XML and load existing XML, for both Java and C#. And, given those classes, it's relatively easy to write code that uses them to write portable data files, using XML. The fact that we used the same schema and data files in Java and C# proves once and for all that XML is a true interoperability language.

But Wait, There's More!

The xsd tool can do some other things, too. It can generate source code in a variety of languages (Visual Basic .NET and JScript.NET, in addition to C#). It can generate a new W3C XML Schema Description for any .NET source file. Finally, it can generate a DataSet subclass, suitable for use in XML-to-RDBMS mapping. And that's where we'll pick up next time, with a comparison of JDO to ADO.NET.