Menu

XML and Database Mapping in .NET

October 23, 2002

Niel Bornstein

The designers of the .NET runtime put a lot of thought into the issue of binding XML to a database. The path from XML to database and back again starts with ADO.NET, .NET's database persistence layer, which can be thought of as analogous to Java's JDBC.

The System.Data.SqlClient namespace has all the classes you'd expect to see, like the SqlConnection, SqlCommand, and DataReader (analogous to the Java Connection, SqlCommand, and ResultSet respectively). The classes in this namespace, whose names all begin with Sql, are meant to work with Microsoft's database of choice, SQL Server. There is an entire suite of classes in the namespace System.Data.OleDb, whose names start with OleDb, which work with any OLE DB provider. If you've used JDBC, you already know basically how to use these classes.

Along with these classes, ADO.NET also introduces the DataSet. A DataSet instance represents an entire database, including the ability to track changes made to individual data elements and to persist them to the underlying database when necessary. The DataSet can maintain its state while disconnected from the actual database.

The DataSet can be used to build a data model without writing any SQL. It can also be used to persist the data model to an XML Schema or to read an existing XML Schema and dynamically build the data model. The DataSet can also read and write its data to XML.

On With the Show

You'll recall that in previous articles I used a dog show to demonstrate generating classes in Java and C# from an XML schema. This time, since we're building a database, the database schema for the dog show will require the following schema:

DB Schema

Normally if you were building a database like this, you would use SQL or have access to a graphical entity relationship modeling tool. With the .NET DataSet class, however, you can build a database schema dynamically. The following code demonstrates one way to do that:

using System;

using System.Data;



public class MakeDataSet {

    public static void Main(string [] args) {

        DataSet dataSet = new DataSet("DogShow");



        // create Show table

        DataTable showTable = dataSet.Tables.Add("Show");

        DataColumn showIdColumn = 

            showTable.Columns.Add("Id", typeof(Int32));

        showTable.Columns.Add("Name", typeof(String));

        showTable.PrimaryKey = 

            new DataColumn [] {showIdColumn};



        // create Breed table

        DataTable breedTable = dataSet.Tables.Add("Breed");

        DataColumn breedIdColumn = 

            breedTable.Columns.Add("Id", typeof(Int32));

        breedTable.Columns.Add("Name", typeof(String));

        breedTable.PrimaryKey = 

            new DataColumn [] {breedIdColumn};



        //  create Dog table

        DataTable dogTable = dataSet.Tables.Add("Dog");

        DataColumn dogIdColumn = 

            dogTable.Columns.Add("Id", typeof(Int32));

        dogTable.Columns.Add("Name", typeof(String));

        DataColumn dogBreedIdColumn = 

            dogTable.Columns.Add("BreedId", typeof(Int32));

        dogTable.PrimaryKey = 

            new DataColumn [] {dogIdColumn};



        // create foreign key relationship

        dataSet.Relations.Add("DogBreed", breedIdColumn, 

            dogBreedIdColumn);



        // create Judge table

        DataTable judgeTable = dataSet.Tables.Add("Judge");

        DataColumn judgeIdColumn = 

            judgeTable.Columns.Add("Id", typeof(Int32));

        judgeTable.Columns.Add("FirstName", typeof(String));

        judgeTable.Columns.Add("LastName", typeof(String));

        judgeTable.PrimaryKey = 

            new DataColumn [] {judgeIdColumn};



        // create ShowRing table

        DataTable showRingTable = 

            dataSet.Tables.Add("ShowRing");

        DataColumn showRingIdColumn = 

            showRingTable.Columns.Add("Id", typeof(Int32));

        showRingTable.Columns.Add("Name", typeof(String));

        showRingTable.PrimaryKey = 

            new DataColumn [] {showRingIdColumn};



        // create Judging table

        DataTable judgingTable = 

            dataSet.Tables.Add("Judging");

        judgingTable.Columns.Add("ShowTime", 

            typeof(DateTime));

        DataColumn judgingBreedIdColumn = 

            judgingTable.Columns.Add("BreedId", 

                typeof(Int32));

        DataColumn judgingJudgeIdColumn = 

            judgingTable.Columns.Add("JudgeId", 

                typeof(Int32));

        DataColumn judgingShowRingIdColumn = 

            judgingTable.Columns.Add("ShowRingId", 

                typeof(Int32));

        DataColumn judgingShowIdColumn = 

            judgingTable.Columns.Add("ShowId", 

                typeof(Int32));



        // create foreign key relationships

        dataSet.Relations.Add("JudgingBreed", 

            breedIdColumn, judgingBreedIdColumn);

        dataSet.Relations.Add("JudgingJudge", 

            judgeIdColumn, judgingJudgeIdColumn);

        dataSet.Relations.Add("JudgingShowRing", 

            showRingIdColumn, judgingShowRingIdColumn);

        dataSet.Relations.Add("JudgingShow", 

            showIdColumn, judgingShowIdColumn);

    }

}

Where's the DB?

I mentioned that the DataSet class can represent a disconnected view of a database. In fact, as in this case, a DataSet doesn't really need a database backing it at all. You can even think of the XML output we're producing as the database, although it lacks all the usual ACID properties of a relational database. Still, it makes for a convenient demonstration of some of the XML functionality.

Having created the DataSet for the dog show, you could now go on and populate it with data using code something like this:

DataRow show = showTable.NewRow();
show["Id"] = 1;
show["Name"] = "O'Reilly Invitational Dog Show";
showTable.Rows.Add(show);

That works well enough, although it's a lot of work, and there's nothing being done at compile time to guarantee that you actually have a column named "Id" in the DataTable. It would be much better if you could access tables and columns in a more type-safe way. Writing the code to do that could be a lot of work, but you'd only have to do it once. Actually, the .NET framework can do a lot of the heavy lifting for you. In addition to the work you saw it do in the last article, the xsd tool can generate a subclass of DataSet for a specific W3C XML Schema.

But first you need a schema. The DataSet class has a WriteXmlSchema() method which creates an W3C XML Schema from the DataSet. You simply need to add a line to the end of the MakeDataSet program to write the schema to a file:

dataSet.WriteXmlSchema("DogShow.xsd");

Now you can generate a DogShow class with the following command line:

xsd /dataset DogShow.xsd

The resulting code, placed by default in the file DogShow.cs, is quite lengthy, and I won't include the source listing here (you can read it here). Suffice it to say that xsd has generated a class which is a subclass of DataSet, which you can use to directly access tables and columns. It's quite convenient.

Here's the source code for a program which uses the generated class to create an instance of DogShow and add data to it. After that, it writes the data to an XML file.

using System;

using System.Data;



public class DogShowWriter {

    public static void Main(string [] args) {

        DogShow dogShow = new DogShow();



        // insert a show row

        DogShow.ShowRow show = dogShow.Show.AddShowRow(1, 

            "O'Reilly Invitational");



        // insert a breed row

        DogShow.BreedRow breed = dogShow.Breed.AddBreedRow(

            1, "English Springer Spaniel");



        // insert a couple of dog rows

        dogShow.Dog.AddDogRow(

            1, "Wil-Orion's Angus Highlander", breed);

        dogShow.Dog.AddDogRow(

            2, "Len-Lear's Webmaster", breed);

        dogShow.Dog.AddDogRow(

            3, "Ch. Sallylyn's Condor", breed);

        

        // insert a judge row

        DogShow.JudgeRow judge = 

            dogShow.Judge.AddJudgeRow(1, "John", "Smith");



        // insert a show ring row

        DogShow.ShowRingRow showRing = 

            dogShow.ShowRing.AddShowRingRow(1, "Ring 1");



        // insert a judging row

        DateTime judgingTime = 

            new DateTime(2002,10,20,14,00,00);

        dogShow.Judging.AddJudgingRow(judgingTime, breed, 

            judge, showRing, show);



        // write the data

        dogShow.WriteXml("DogShow.xml");

    }

}

You can see that with the generated DogShow class, it's much easier to write and debug your code, for a few reasons. First, since the compiler will warn you when you reference a table or column that doesn't exist, you're much less likely to run into runtime errors. Second, xsd has generated constructors for each table's rows, so you don't need to go through the tedious effort of referencing each column through the row's indexer property.

You should note the last line of code, where I've called dogShow.WriteXml("DogShow.xml");. It creates an XML file representing the DataSet that I just created and filled with data. Here's the file:

<?xml version="1.0" standalone="yes"?>

<DogShow>

  <Show>

    <Id>1</Id>

    <Name>O'Reilly Invitational</Name>

  </Show>

  <Breed>

    <Id>1</Id>

    <Name>English Springer Spaniel</Name>

  </Breed>

  <Dog>

    <Id>1</Id>

    <Name>Wil-Orion's Angus Highlander</Name>

    <BreedId>1</BreedId>

  </Dog>

  <Dog>

    <Id>2</Id>

    <Name>Len-Lear's Webmaster</Name>

    <BreedId>1</BreedId>

  </Dog>

  <Dog>

    <Id>3</Id>

    <Name>Ch. Sallylyn's Condor</Name>

    <BreedId>1</BreedId>

  </Dog>

  <Judge>

    <Id>1</Id>

    <FirstName>John</FirstName>

    <LastName>Smith</LastName>

  </Judge>

  <ShowRing>

    <Id>1</Id>

    <Name>Ring 1</Name>

  </ShowRing>

  <Judging>

    <ShowTime>2002-10-20T14:00:00.0000000-04:00</ShowTime>

    <BreedId>1</BreedId>

    <JudgeId>1</JudgeId>

    <ShowRingId>1</ShowRingId>

    <ShowId>1</ShowId>

  </Judging>

</DogShow>

This file can also be read back into a DataSet -- or into an instance of the generated DogShow class -- with the ReadXml() method.

Making the Data Fit

So now we've got an XML file which represents the O'Reilly Invitational Dog Show. Unfortunately, it's not the same as the one in the last article, which was more tree-oriented. The XML generated here is more table-oriented, which is entirely appropriate, given that we are talking about a relational database. It would be nice to be able to transform the original tree-oriented XML into the new table-oriented schema. Seems like there ought to be a tool to do that.

Thanks to the fact that all of .NET's XML classes are tightly integrated, you can use XSLT to transform a document on disk into an XmlDocument in memory, and then pass that to another class. So, given an appropriate XSLT stylesheet named "DogShow.xsl", the following code will read it right into the DataSet:

using System;

using System.Data;

using System.Xml;

using System.Xml.Xsl;



public class DogShowTransformer {

    public static void Main(string [] args) {



        // Create a DataSet instance

        DogShow dogShow = new DogShow();



        // Read the data

        XmlDocument document = new XmlDocument();

        document.Load("DogShow.xml");

        

        // Read the stylesheet

        XslTransform xslt = new XslTransform();

        xslt.Load("DogShow.xsl");

        

        // Transform the data

        XmlReader reader = xslt.Transform(document, null);

        

        // Load the DataSet

        dogShow.ReadXml(reader);

    }

}

Now What?

DataSet does quite a bit more than I can cover here. For example, I mentioned its ability to keep track of changes while disconnected from the database. The DiffGram is responsible for this ability. The DiffGram is itself an XML document, and it is used to serialize DataSets across a web service. The DiffGram consists of three sections: one containing the current data, <DataInstance>; one containing any changes to the DataSet, <diffgr:before>; and one listing any errors, <diffgr:errors>. Any changes in the <diffgr:before> section and errors in the <diffgr:errors> section are related back to the data in the <DataInstance> section by the diffgr:id attribute.

I haven't touched on the XmlDataDocument class yet, either. This class enables you to synchronize multiple of views of the same DataSet, much as I demonstrated with XSLT earlier. (The main reason I did not use XmlDataDocument is that it depends on the names of elements being the same in both schemas, so that the elements can be mapped to each other properly. As you will recall, in the previous article, the XML schema had different element names.)

There's still a whole lot more to be covered, including the SqlCommand's ExecuteXmlReader() method, which returns an XmlReader instance you can use to read data from a database with the regular .NET XML tools.

I continue to be impressed with the extent of .NET's XML integration. All the major technologies are represented, and Microsoft has use XML in innovative ways throughout the framework. I believe that the Java XML community can certainly learn a thing or two from .NET.

Resources

Postscript

One of the challenges of comparing the handling of XML in Java and C# is the fact that XML processing is not built into the Java class library. When deciding what features to compare, I've often resorted to searching the Web to find comparable Java tools for features that are a standard part of the .NET Common Language Runtime.

In my last article, I wanted to compare XML databinding in Java and C#. I knew that the .NET CLR contained excellent support for generating classes from an XML Schema, and I knew that Castor also supported databinding. As I examined Castor, I saw that it also supported binding Java objects to relational databases using something called JDO, and I put this fact aside to investigate in a future article.

Unfortunately, I forgot to read the fine print:

Does Castor JDO comply with the SUN JSR-000012 specification?

No, Castor JDO doesn't comply with the SUN's JDO specification.

Although Castor JDO carries very similar goals as SUN's JDO, it has been developed independently from the JSR.

And this pretty much tells the story when it comes to support for XML in Java. There are many implementations; some predate the applicable standards, and some conform to the standards.