XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Mapping DTDs to Databases

May 09, 2001

1. Overview

A common question in the XML community is how to map XML to databases. This article discusses two mappings: a table-based mapping and an object-relational (object-based) mapping. Both mappings model the data in XML documents rather than the documents themselves. This makes the mappings a good choice for data-centric documents and a poor choice for document-centric documents. The table-based mapping can't handle mixed content at all, and the object-relational mapping of mixed content is extremely inefficient.

Both mappings are commonly used as the basis for software that transfers data between XML documents and databases, especially relational databases. An important characteristic in this respect is that they are bidirectional. That is, they can be used to transfer data both from XML documents to the database and from the database to XML documents. One consequence is that they are likely to be used as canonical mappings on top of which XML query languages can be built over non-XML databases. The canonical mappings will define virtual XML documents that can be queried with something like XQuery.

In addition to being used to transfer data between XML documents and databases, the first part of the object-relational mapping is used in "data binding", the marshalling and unmarshalling of data between XML documents and objects.

Table of Contents

Table-based Mapping
Object-Relational Mapping
The Basic Mapping
Mapping Complex Content Models
Mapping Mixed Content
Mapping Order
Mapping Attributes
Alternate Mappings
Generating Schema
Generating Relational Database Schema from DTDs
Generating DTDs from Database Schema
Mapping XML Schemas to Databases
Related Topics

2. Table-based Mapping

There is an obvious mapping between the following XML document and table:

         <C>ccc</C>             Table A
         <D>ddd</D>             -------
         <E>eee</E>          C     D     E
      </B>                  ---   ---   ---
      <B>             <=>   ...   ...   ...
         <C>fff</C>         ccc   ddd   eee
         <D>ggg</D>         fff   ggg   hhh
         <E>hhh</E>         ...   ...   ...

It's called a table-based mapping. It views the document as a single table or a set of tables. The structure of the document must be either




with the caveat that column data can be represented either as PCDATA-only elements (shown) or attributes.

The obvious advantage of this mapping is its simplicity. Because it matches the structure of tables and result sets in a relational database, it is easy to write code based on this mapping; code which is fast, scales well, and is quite useful for certain applications, such as transferring data between databases one table at a time.

The mapping has several disadvantages; primarily, it only works with a very small subset of XML documents. In addition, it does not preserve physical structure (such as character and entity references, CDATA sections, character encodings, or the standalone declaration) or document information (such as the document type or DTD), comments, or processing instructions.

The table-based mapping is commmonly used by middleware to transfer data between XML documents and relational databases. It is also used in some Web application servers to return result set data as XML.

3. Object-Relational Mapping

Because table-based mappings only work with a limited subset of XML documents, some middleware tools, most XML-enabled relational databases, and most XML-enabled object servers use a more sophisticated mapping, called an object-relational mapping. This models the XML document as a tree of objects that are specific to the data in the document, then maps these objects to the database.

(The name "object-relational" is actually a misnomer -- a better name would be object-based mapping. This is because the objects can be mapped to non-relational databases, such as object-oriented or hierarchical databases, or simply left alone, as is done in data binding systems. However, because object-relational is a well-known term and this mapping is most commonly used with relational databases, that term will be used here. In addition, all examples use relational tables.)

To understand the object-relational mapping, it is easiest to first look at some simple examples. To start, notice that there is an obvious mapping between the following XML document, object, and row in a table:

        XML                  Objects                  Tables
   =============           ============           ===============

                                                     Table A
   <A>                     object A {                 -------
      <B>bbb</B>              B = "bbb"            B     C     D
      <C>ccc</C>    <=>       C = "ccc"    <=>    ---   ---   ---
      <D>ddd</D>              D = "ddd"           ...   ...   ...
   </A>                    }                      bbb   ccc   ddd
                                                  ...   ...   ...

Similarly, there is an obvious mapping between the following element type definition, class, and table schema:

            DTD                        Classes                   Table Schema
   ======================           ============           ==========================

                                    class A {              CREATE TABLE A
                                       String B;              B VARCHAR(10) NOT NULL,
   <!ELEMENT A (B, C, D)>    <=>       String C;    <=>       C VARCHAR(10) NOT NULL,
                                       String D;              D VARCHAR(10) NOT NULL
                                    }                      )

As a more complex example, consider the following XML document:

      <Customer>Gallagher Industries</Customer>
      <Item Number="1">
      <Item Number="2">

which maps to the following objects:

      object SalesOrder {
         number = 1234;
         customer = "Gallagher Industries";
         date = 29.10.00;
         items = {ptrs to Item objects};
      }           /   \
                 /     \
                /       \
   object Item {       object Item {
      number = 1;         number = 2;
      part = "A-10";      part = "B-43";
      quantity = 12;      quantity = 600;
      price = 10.95;      price = 3.95;
   }                   }

and then to rows in the following tables:

   Number   Customer                Date
   ------   --------------------    --------
   1234     Gallagher Industries    29.10.00
   ...      ...                     ...
   ...      ...                     ...
   SONumber   Item   Part   Quantity   Price
   --------   ----   ----   --------   -----
   1234       1      A-10   12         10.95
   1234       2      B-43   600        3.99
   ...        ...    ...    ...        ...

All of these are examples of object-relational mappings from DTDs to relational databases.

Pages: 1, 2, 3, 4, 5

Next Pagearrow