XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Mapping DTDs to Databases
by Ronald Bourret | Pages: 1, 2, 3, 4, 5

4.2. Generating DTDs from Database Schema

DTDs are generated by starting from a single "root" table or set of root tables and processing each:

  • Each root table generates an element type with element content in the form of a single sequence.
  • Each data (non-key) column in the table generates an element type with PCDATA-only content and a reference in the sequence; nullable columns generate optional references.

Primary and foreign keys are generated as follows:

  • The remote table is processed in the same manner as a root table.
  • A reference to the element type for the remote table is added to the sequence.
  • If the key is the primary key, the reference is optional and repeated (* operator). This is because there is no guarantee that a row will exist in the foreign table, nor is there a guarantee that only one row will exist.
  • If the key is the primary key, PCDATA-only element types are optionally generated for each column in the key. If these are generated, references to these element types are added to the sequence. This is useful only if primary keys contain data.
  • If the key is a foreign key and is nullable, the reference is optional (? operator).

During this process, the key used to reach the table (if any) is not processed. This saves the algorithm from generating element types that duplicate those created in the parent table.

The following example shows how this process works. Consider the following database schema:

   Table Orders
      Column OrderNum
      Column Date
      Column CustNum
 
   Table Items
      Column OrderNum
      Column ItemNum
      Column Quantity
      Column PartNum

   Table Parts
      Column PartNum
      Column Price

In our first step, we generate an element type for the root table (Orders):

        Tables                                            DTD
   ==================               ===================================================

   Table Orders            ==>      <!ELEMENT Orders ()>
      Column OrderNum
      Column Date
      Column CustNum
 
   Table Items
      Column OrderNum
      Column ItemNum
      Column Quantity
      Column PartNum

   Table Parts
      Column PartNum
      Column Price

Next, we generate PCDATA-only elements for the data columns (Date and CustNum) and add references to these elements to the content model of the Orders element:

        Tables                                            DTD
   ==================               ===================================================

   Table Orders            ==>      <!ELEMENT Orders (Date, CustNum)>
      Column OrderNum
      Column Date                   <!ELEMENT Date (#PCDATA)>
      Column CustNum                <!ELEMENT CustNum (#PCDATA)>
 
   Table Items
      Column OrderNum
      Column ItemNum
      Column Quantity
      Column PartNum

   Table Parts
      Column PartNum
      Column Price

Now we generate a PCDATA-only element for the primary key (OrderNum) and add a reference to it to the content model:

        Tables                                            DTD
   ==================               ===================================================

   Table Orders            ==>      <!ELEMENT Orders (Date, CustNum, OrderNum)>
      Column OrderNum               <!ELEMENT OrderNum (#PCDATA)>
      Column Date                   <!ELEMENT Date (#PCDATA)>
      Column CustNum                <!ELEMENT CustNum (#PCDATA)>
 
   Table Items
      Column OrderNum
      Column ItemNum
      Column Quantity
      Column PartNum

   Table Parts
      Column PartNum
      Column Price

And then add an element for the table (Items) to which the primary key is exported, as well as a reference to it in the content model:

        Tables                                            DTD
   ==================               ===================================================

   Table Orders                     <!ELEMENT Orders (Date, CustNum, OrderNum, Items*)>
      Column OrderNum               <!ELEMENT OrderNum (#PCDATA)>
      Column Date                   <!ELEMENT Date (#PCDATA)>
      Column CustNum                <!ELEMENT CustNum (#PCDATA)>
 
   Table Items             ==>      <!ELEMENT Items()>
      Column OrderNum
      Column ItemNum
      Column Quantity
      Column PartNum

   Table Parts
      Column PartNum
      Column Price

We process the data and primary key columns in the remote (Items) table in the same way:

        Tables                                            DTD
   ==================               ===================================================

   Table Orders                     <!ELEMENT Orders (Date, CustNum, OrderNum, Items*)>
      Column OrderNum               <!ELEMENT OrderNum (#PCDATA)>
      Column Date                   <!ELEMENT Date (#PCDATA)>
      Column CustNum                <!ELEMENT CustNum (#PCDATA)>
 
   Table Items             ==>      <!ELEMENT Items(ItemNum, Quantity)>
      Column OrderNum
      Column ItemNum                <!ELEMENT ItemNum (#PCDATA)>
      Column Quantity               <!ELEMENT Quantity (#PCDATA)>
      Column PartNum

   Table Parts
      Column PartNum
      Column Price

And then add an element for the table (Parts) to which the foreign key corresponds:

        Tables                                            DTD
   ==================               ===================================================

   Table Orders                     <!ELEMENT Orders (Date, CustNum, OrderNum, Items*)>
      Column OrderNum               <!ELEMENT OrderNum (#PCDATA)>
      Column Date                   <!ELEMENT Date (#PCDATA)>
      Column CustNum                <!ELEMENT CustNum (#PCDATA)>
 
   Table Items                      <!ELEMENT Items(ItemNum, Quantity, Parts)>
      Column OrderNum
      Column ItemNum                <!ELEMENT ItemNum (#PCDATA)>
      Column Quantity               <!ELEMENT Quantity (#PCDATA)>
      Column PartNum

   Table Parts             ==>      <!ELEMENT Parts()>
      Column PartNum
      Column Price

Finally, we process the foreign key table (Parts):

        Tables                                            DTD
   ==================               ===================================================

   Table Orders                     <!ELEMENT Orders (Date, CustNum, OrderNum, Items*)>
      Column OrderNum               <!ELEMENT OrderNum (#PCDATA)>
      Column Date                   <!ELEMENT Date (#PCDATA)>
      Column CustNum                <!ELEMENT CustNum (#PCDATA)>
 
   Table Items                      <!ELEMENT Items (ItemNum, Quantity, Parts)>
      Column OrderNum
      Column ItemNum                <!ELEMENT ItemNum (#PCDATA)>
      Column Quantity               <!ELEMENT Quantity (#PCDATA)>
      Column PartNum

   Table Parts             ==>      <!ELEMENT Parts(PartNum, Price)>
      Column PartNum                <!ELEMENT PartNum (#PCDATA)>
      Column Price                  <!ELEMENT Price (#PCDATA)>

As was the case in the previous section, the generated DTD is not what a human would have created. Although the only problems here are with names, the algorithm cannot recognize order columns or property tables.

5. Mapping XML Schemas to Databases

Most XML schema languages can be mapped to databases with an object-relational mapping. The exact mappings depend on the language. DDML, DCD, and XML Data Reduced schemas can be mapped in a manner almost identical to DTDs. The mappings for W3C Schemas, Relax, TREX, and SOX appear to be somewhat more complex. It is not clear to me that Schematron can be mapped.

In the case of W3C Schemas, a complete mapping to object schemas and then to database schemas is available. Briefly, this maps complex types to classes (with complex type extension mapped to inheritance) and maps simple types to scalar data types (although many facets are lost). "All" groups are treated like unordered sequences and substitution groups are treated like choices. Finally, most identity constraints are mapped to keys. For complete details, see http://www.rpbourret.com/xml/SchemaMap.htm.

6. Related Topics

For a wider discussion of XML and databases, see:

  XML and Databases (http://www.rpbourret.com/xml/XMLAndDatabases.htm)

For a reasonably updated list of XML database products, see:

  XML Database Products (http://www.rpbourret.com/xml/XMLDatabaseProds.htm)

If you have comments or questions about this paper, you can reach me at:

  rpbourret@rpbourret.com