XML and Database Mapping in .NET
The designers of the .NET runtime put a lot of thought into the issue of binding XML to a database. The path from XML to database and back again starts with ADO.NET, .NET's database persistence layer, which can be thought of as analogous to Java's JDBC.
The System.Data.SqlClient namespace has all the
classes you'd expect to see, like the SqlConnection,
SqlCommand, and DataReader (analogous to
the Java Connection, SqlCommand, and
ResultSet respectively). The classes in this
namespace, whose names all begin with Sql, are meant
to work with Microsoft's database of choice, SQL Server. There is
an entire suite of classes in the namespace
System.Data.OleDb, whose names start with
OleDb, which work with any OLE DB provider. If you've
used JDBC, you already know basically how to use these classes.
Along with these classes, ADO.NET also introduces the
DataSet. A DataSet instance represents
an entire database, including the ability to track changes made to
individual data elements and to persist them to the underlying
database when necessary. The DataSet can maintain its
state while disconnected from the actual database.
The DataSet can be used to build a data model
without writing any SQL. It can also be used to persist the data
model to an XML Schema or to read an existing XML Schema and
dynamically build the data model. The DataSet can
also read and write its data to XML.
You'll recall that in previous articles I used a dog show to demonstrate generating classes in Java and C# from an XML schema. This time, since we're building a database, the database schema for the dog show will require the following schema:
Normally if you were building a database like this, you would
use SQL or have access to a graphical entity relationship
modeling tool. With the .NET DataSet class,
however, you can build a database schema dynamically. The
following code demonstrates one way to do that:
using System;
using System.Data;
public class MakeDataSet {
public static void Main(string [] args) {
DataSet dataSet = new DataSet("DogShow");
// create Show table
DataTable showTable = dataSet.Tables.Add("Show");
DataColumn showIdColumn =
showTable.Columns.Add("Id", typeof(Int32));
showTable.Columns.Add("Name", typeof(String));
showTable.PrimaryKey =
new DataColumn [] {showIdColumn};
// create Breed table
DataTable breedTable = dataSet.Tables.Add("Breed");
DataColumn breedIdColumn =
breedTable.Columns.Add("Id", typeof(Int32));
breedTable.Columns.Add("Name", typeof(String));
breedTable.PrimaryKey =
new DataColumn [] {breedIdColumn};
// create Dog table
DataTable dogTable = dataSet.Tables.Add("Dog");
DataColumn dogIdColumn =
dogTable.Columns.Add("Id", typeof(Int32));
dogTable.Columns.Add("Name", typeof(String));
DataColumn dogBreedIdColumn =
dogTable.Columns.Add("BreedId", typeof(Int32));
dogTable.PrimaryKey =
new DataColumn [] {dogIdColumn};
// create foreign key relationship
dataSet.Relations.Add("DogBreed", breedIdColumn,
dogBreedIdColumn);
// create Judge table
DataTable judgeTable = dataSet.Tables.Add("Judge");
DataColumn judgeIdColumn =
judgeTable.Columns.Add("Id", typeof(Int32));
judgeTable.Columns.Add("FirstName", typeof(String));
judgeTable.Columns.Add("LastName", typeof(String));
judgeTable.PrimaryKey =
new DataColumn [] {judgeIdColumn};
// create ShowRing table
DataTable showRingTable =
dataSet.Tables.Add("ShowRing");
DataColumn showRingIdColumn =
showRingTable.Columns.Add("Id", typeof(Int32));
showRingTable.Columns.Add("Name", typeof(String));
showRingTable.PrimaryKey =
new DataColumn [] {showRingIdColumn};
// create Judging table
DataTable judgingTable =
dataSet.Tables.Add("Judging");
judgingTable.Columns.Add("ShowTime",
typeof(DateTime));
DataColumn judgingBreedIdColumn =
judgingTable.Columns.Add("BreedId",
typeof(Int32));
DataColumn judgingJudgeIdColumn =
judgingTable.Columns.Add("JudgeId",
typeof(Int32));
DataColumn judgingShowRingIdColumn =
judgingTable.Columns.Add("ShowRingId",
typeof(Int32));
DataColumn judgingShowIdColumn =
judgingTable.Columns.Add("ShowId",
typeof(Int32));
// create foreign key relationships
dataSet.Relations.Add("JudgingBreed",
breedIdColumn, judgingBreedIdColumn);
dataSet.Relations.Add("JudgingJudge",
judgeIdColumn, judgingJudgeIdColumn);
dataSet.Relations.Add("JudgingShowRing",
showRingIdColumn, judgingShowRingIdColumn);
dataSet.Relations.Add("JudgingShow",
showIdColumn, judgingShowIdColumn);
}
}
|
I mentioned that the DataSet class can represent a
disconnected view of a database. In fact, as in this case, a
DataSet doesn't really need a database backing it
at all. You can even think of the XML output we're producing as
the database, although it lacks all the usual ACID properties of
a relational database. Still, it makes for a convenient
demonstration of some of the XML functionality.
Having created the DataSet for the dog show, you
could now go on and populate it with data using code something
like this:
DataRow show = showTable.NewRow();
show["Id"] = 1;
show["Name"] = "O'Reilly Invitational Dog Show";
showTable.Rows.Add(show);
That works well enough, although it's a lot of work, and there's
nothing being done at compile time to guarantee that you
actually have a column named "Id" in the
DataTable. It would be much better if you could
access tables and columns in a more type-safe way. Writing the
code to do that could be a lot of work, but you'd only have to
do it once. Actually, the .NET framework can do a lot of the
heavy lifting for you. In addition to the work you saw it do in
the last article, the xsd tool can generate a
subclass of DataSet for a specific W3C XML Schema.
But first you need a schema. The DataSet class has
a WriteXmlSchema() method which creates an W3C XML
Schema from the DataSet. You simply need to add a
line to the end of the MakeDataSet program to write
the schema to a file:
dataSet.WriteXmlSchema("DogShow.xsd");
Now you can generate a DogShow class with the
following command line:
xsd /dataset DogShow.xsd
The resulting code, placed by default in the file
DogShow.cs, is quite lengthy, and I won't include
the source listing here (you can read it here). Suffice it to say that
xsd has generated a class which is a subclass of
DataSet, which you can use to directly access
tables and columns. It's quite convenient.
Here's the source code for a program which uses the generated
class to create an instance of DogShow and add data
to it. After that, it writes the data to an XML file.
using System;
using System.Data;
public class DogShowWriter {
public static void Main(string [] args) {
DogShow dogShow = new DogShow();
// insert a show row
DogShow.ShowRow show = dogShow.Show.AddShowRow(1,
"O'Reilly Invitational");
// insert a breed row
DogShow.BreedRow breed = dogShow.Breed.AddBreedRow(
1, "English Springer Spaniel");
// insert a couple of dog rows
dogShow.Dog.AddDogRow(
1, "Wil-Orion's Angus Highlander", breed);
dogShow.Dog.AddDogRow(
2, "Len-Lear's Webmaster", breed);
dogShow.Dog.AddDogRow(
3, "Ch. Sallylyn's Condor", breed);
// insert a judge row
DogShow.JudgeRow judge =
dogShow.Judge.AddJudgeRow(1, "John", "Smith");
// insert a show ring row
DogShow.ShowRingRow showRing =
dogShow.ShowRing.AddShowRingRow(1, "Ring 1");
// insert a judging row
DateTime judgingTime =
new DateTime(2002,10,20,14,00,00);
dogShow.Judging.AddJudgingRow(judgingTime, breed,
judge, showRing, show);
// write the data
dogShow.WriteXml("DogShow.xml");
}
}
You can see that with the generated DogShow class,
it's much easier to write and debug your code, for a few
reasons. First, since the compiler will warn you when you
reference a table or column that doesn't exist, you're much less
likely to run into runtime errors. Second, xsd has
generated constructors for each table's rows, so you don't need
to go through the tedious effort of referencing each column
through the row's indexer property.
You should note the last line of code, where I've called
dogShow.WriteXml("DogShow.xml");. It creates an XML
file representing the DataSet that I just created
and filled with data. Here's the file:
<?xml version="1.0" standalone="yes"?>
<DogShow>
<Show>
<Id>1</Id>
<Name>O'Reilly Invitational</Name>
</Show>
<Breed>
<Id>1</Id>
<Name>English Springer Spaniel</Name>
</Breed>
<Dog>
<Id>1</Id>
<Name>Wil-Orion's Angus Highlander</Name>
<BreedId>1</BreedId>
</Dog>
<Dog>
<Id>2</Id>
<Name>Len-Lear's Webmaster</Name>
<BreedId>1</BreedId>
</Dog>
<Dog>
<Id>3</Id>
<Name>Ch. Sallylyn's Condor</Name>
<BreedId>1</BreedId>
</Dog>
<Judge>
<Id>1</Id>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
</Judge>
<ShowRing>
<Id>1</Id>
<Name>Ring 1</Name>
</ShowRing>
<Judging>
<ShowTime>2002-10-20T14:00:00.0000000-04:00</ShowTime>
<BreedId>1</BreedId>
<JudgeId>1</JudgeId>
<ShowRingId>1</ShowRingId>
<ShowId>1</ShowId>
</Judging>
</DogShow>
This file can also be read back into a DataSet --
or into an instance of the generated DogShow class
-- with the ReadXml() method.
So now we've got an XML file which represents the O'Reilly Invitational Dog Show. Unfortunately, it's not the same as the one in the last article, which was more tree-oriented. The XML generated here is more table-oriented, which is entirely appropriate, given that we are talking about a relational database. It would be nice to be able to transform the original tree-oriented XML into the new table-oriented schema. Seems like there ought to be a tool to do that.
Thanks to the fact that all of .NET's XML classes are tightly
integrated, you can use XSLT to transform a document on disk
into an XmlDocument in memory, and then pass that
to another class. So, given an appropriate XSLT stylesheet
named "DogShow.xsl", the following code will read it right into
the DataSet:
using System;
using System.Data;
using System.Xml;
using System.Xml.Xsl;
public class DogShowTransformer {
public static void Main(string [] args) {
// Create a DataSet instance
DogShow dogShow = new DogShow();
// Read the data
XmlDocument document = new XmlDocument();
document.Load("DogShow.xml");
// Read the stylesheet
XslTransform xslt = new XslTransform();
xslt.Load("DogShow.xsl");
// Transform the data
XmlReader reader = xslt.Transform(document, null);
// Load the DataSet
dogShow.ReadXml(reader);
}
}
DataSet does quite a bit more than I can cover
here. For example, I mentioned its ability to keep track of
changes while disconnected from the database. The
DiffGram is responsible for this ability. The
DiffGram is itself an XML document, and it is used
to serialize DataSets across a web service. The
DiffGram consists of three sections: one containing
the current data, <DataInstance>; one containing
any changes to the DataSet,
<diffgr:before>; and one listing any errors,
<diffgr:errors>. Any changes in the
<diffgr:before> section and errors in the
<diffgr:errors> section are related back to the
data in the <DataInstance> section by the
diffgr:id attribute.
I haven't touched on the XmlDataDocument class yet,
either. This class enables you to synchronize multiple of views
of the same DataSet, much as I demonstrated with
XSLT earlier. (The main reason I did not use
XmlDataDocument is that it depends on the names of
elements being the same in both schemas, so that the elements
can be mapped to each other properly. As you will recall, in the
previous article, the XML schema had different element names.)
There's still a whole lot more to be covered, including the
SqlCommand's ExecuteXmlReader()
method, which returns an XmlReader instance you can
use to read data from a database with the regular .NET XML
tools.
I continue to be impressed with the extent of .NET's XML integration. All the major technologies are represented, and Microsoft has use XML in innovative ways throughout the framework. I believe that the Java XML community can certainly learn a thing or two from .NET.
Resources |
One of the challenges of comparing the handling of XML in Java and C# is the fact that XML processing is not built into the Java class library. When deciding what features to compare, I've often resorted to searching the Web to find comparable Java tools for features that are a standard part of the .NET Common Language Runtime.
In my last article, I wanted to compare XML databinding in Java and C#. I knew that the .NET CLR contained excellent support for generating classes from an XML Schema, and I knew that Castor also supported databinding. As I examined Castor, I saw that it also supported binding Java objects to relational databases using something called JDO, and I put this fact aside to investigate in a future article.
Unfortunately, I forgot to read the fine print:
Does Castor JDO comply with the SUN JSR-000012 specification?
No, Castor JDO doesn't comply with the SUN's JDO specification.
Although Castor JDO carries very similar goals as SUN's JDO, it has been developed independently from the JSR.
And this pretty much tells the story when it comes to support for XML in Java. There are many implementations; some predate the applicable standards, and some conform to the standards.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.