Storing XML in Relational Databases
|
Table of Contents Oracle XML-SQL Utility (XSU) Related Articles |
There are various ways to solve the problem of effective, automatic conversion of XML data into and out of relational databases. Database vendors such as IBM, Microsoft, Oracle, and Sybase have developed tools to assist in converting XML documents into relational tables. The various solutions are as follows.
Oracle XML SQL Utility models XML document elements as a collection of nested tables. Enclosed elements are modeled by employing the Oracle Object datatype. The "SQL-to-XML" conversion constructs an XML document by using a one-to-one association between a table, referenced by Object datatype, and a nested element. "XML-to-SQL" might require either data model amending (converting it from relational into object-relational) or restructuring the original XML document.
IBM DB2 XML Extender allows storing XML documents either as BLOB-like objects or as decomposed into a set of tables. The latter transformation, known as XML collection, is defined in XML 1.0 syntax.
Microsoft approaches the problem by extending SQL-92 and by introducing OPENXML row set.
Sybase Adaptive Server introduces the ResultSetXml Java class as a base for processing XML documents in both directions.
In this article, we will explore in detail these vendor's solutions. Thereafter, we will attempt to answer the questions:
I'll use the following vocabulary as an example.
<!-- Primitive Types -->
<!ELEMENT CURRENCY1 (#PCDATA)>
<!ATTLIST CURRENCY1 e-dtype NMTOKEN #FIXED "string"
e-dsize NMTOKEN #FIXED "3">
<!ELEMENT CURRENCY2 (#PCDATA)>
<!ATTLIST CURRENCY2 e-dtype NMTOKEN #FIXED "string"
e-dsize NMTOKEN #FIXED "3">
<!ELEMENT AMOUNT (#PCDATA)>
<!ATTLIST AMOUNT e-dtype NMTOKEN #FIXED "decimal">
<!ELEMENT SETTLEMENT (#PCDATA)>
<!ATTLIST SETTLEMENT e-dtype NMTOKEN #FIXED "date">
<!ELEMENT BANKCODE (#PCDATA)>
<!ATTLIST BANKCODE e-dtype NMTOKEN #FIXED "string">
<!ELEMENT BANKACCT (#PCDATA)>
<!ATTLIST BANKACCT e-dtype NMTOKEN #FIXED "string">
<!-- Derived Types -->
<!ELEMENT ACCOUNT (BANKCODE, BANKACCT)>
<!ELEMENT FXTRADE (CURRENCY1, CURRENCY2, AMOUNT, SETTLEMENT, ACCOUNT)>
|
Oracle translates the chain of object references from the database into the hierarchical structure of XML elements. In an object-relational database, the field ACCOUNT in the table FXTRADE is modeled as an object reference of type AccountType:
CREATE TABLE FXTRADE
{
CURRENCY1 CHAR (3),
CURRENCY2 CHAR (3),
AMOUNT NUMERIC (18,2),
SETTLEMENT DATE,
ACCOUNT AccountType // object reference
}
CREATE TYPE AccountType as OBJECT
{
BANKCODE VARCHAR (100),
BANKACCT VARCHAR (100)
}
|
A corresponding XML document generated from the given object-relational model (using ";SELECT * FROM FXTRADE") looks like
<?xml version="1.0"?>
<ROWSET>
<ROW num="1">
<CURRENCY1>GBP</CURRENCY1>
<CURRENCY2>JPY</CURRENCY2>
<AMOUNT>10000</AMOUNT>
<SETTLEMENT>20010325</SETTLEMENT>
<ACCOUNT>
<BANKCODE>812</BANKCODE>
<BANKACCT>00365888</BANKACCT>
</ACCOUNT>
</ROW>
<!-- additional rows ... -->
</ROWSET>
|
The example below is taken from Oracle's XSU documentation with appropriate substitution of SQL statements and using Oracle's pure Java JDBC thin driver.
First, an instance of OracleXMLQuery is created;
second, a query is executed, and the result is represented in the form
of an XML document as above. Similarly, an XML document can be
extracted in the form of DOM; in this case
qry.getXMLDOM() would be invoked instead of
getXMLString().
import oracle.jdbc.driver.*;
import oracle.xml.sql.query.OracleXMLQuery;
import java.lang.*;
import java.sql.*;
// class to test XML document generation as String
class testXMLSQL {
public static void main(String[] args)
{
try {
// Create the connection
Connection conn = getConnection("scott","tiger");
// Create the query class
OracleXMLQuery qry = new OracleXMLQuery(conn,
"SELECT * FROM FXTRADE");
// Get the XML string
String str = qry.getXMLString();
// Print the XML output
System.out.println("The XML output is:\n"+str);
// Always close the query to get rid of any resources..
qry.close();
} catch(SQLException e) {
System.out.println(e.toString());
}
}
// Get the connection given the user name and password.!
private static Connection getConnection(String username,
String password)
throws SQLException
{
// register the JDBC driver..
DriverManager.registerDriver(new
oracle.jdbc.driver.OracleDriver());
// Create the connection using the OCI8 driver
Connection conn =
DriverManager.getConnection(
"jdbc:oracle:thin:@dlsun489:1521:ORCL",
username,password);
return conn;
}
}
|
In this example OracleXMLSave is employed to store our
XML document in an object-relational model; the insertXML
method performs the actual insertion of the data.
import java.sql.*;
import oracle.xml.sql.dml.OracleXMLSave;
public class testXMLInsert
{
public static void main(String args[])
throws SQLException
{
Connection conn = getConnection("scott","tiger");
OracleXMLSave sav = new OracleXMLSave(conn, "scott. FXTRADE");
// Assume that the user passes in this document as 0-arg
sav.insertXML(args[0]);
sav.close();
}
...
}
|
That's fine if the XML and object-relational model in the database are synchronized, but what if they aren't? You have two options in that case.
|
|
| Post your comments |
XSU does not permit storage of attribute values; it's recommended that you transform attributes into elements.
An XML to SQL mapping is modeled by an object-relational model construction rule as follows: each nested XML element is mapped onto an object reference of appropriate type. Mapping rules are implicitly embedded in the database model.
The Java API consists of the classes OracleXMLQuery
and OracleXMLSave.
|
IBM's XML Extender provides two access and storage methods for using DB2 as an XML repository:
DTDs are stored in the DTD repository, a DB2 table called DTD_REF; it's schema name is "db2xml". Each DTD in the DTD_REF table has a unique ID. The mapping between the database tables and the structure of the XML document is defined by means of a Data Access Definition (DAD) file. DAD refers to a processed document DTD, thus providing a bridge between an XML document, its DTD, and mapping rules onto database tables.
Here's an example DAD.
<?xml version="1.0"?>
<!DOCTYPE DAD SYSTEM "dad.dtd">
<DAD>
<dtdid>FXTRADE.DTD</dtdid>
<validation>YES</validation>
<Xcollection>
<prolog>?xml version="1.0"?</prolog>
<doctype>!DOCTYPE FXTRADE FXTRADE.DTD </doctype>
<root_node>
<element_node name="FXTRADE">
<RDB_node>
<table name="FXTRADE"/>
<table name="ACCOUNT" key="ID"/>
<condition>
FXTRADE.ACCOUNT=ACCOUNT.ID
</condition>
</RDB_node>
<element_node name="CURRENCY1">
<text_node>
<RDB_node>
<table name="FXTRADE"/>
<column name="CURRENCY1" type="CHAR(3)"/>
</RDB_node>
</text_node>
</element_node>
<element_node name="CURRENCY2">
<text_node>
<RDB_node>
<table name="FXTRADE"/>
<column name="CURRENCY2" type="CHAR(3)"/>
</RDB_node>
</text_node>
</element_node>
<element_node name="AMOUNT">
<text_node>
<RDB_node>
<table name="FXTRADE"/>
<column name="AMOUNT" type="DECIMAL(18,2)"/>
</RDB_node>
</text_node>
</element_node>
<element_node name="SETTLEMENT">
<text_node>
<RDB_node>
<table name="FXTRADE"/>
<column name="SETTLEMENT" type="DATE"/>
</RDB_node>
</text_node>
</element_node>
<element_node name="ACCOUNT">
<element_node name="BANKCODE">
<text_node>
<RDB_node>
<table name="ACCOUNT"/>
<column name="BANKCODE"
type="VARCHAR(100)"/>
</RDB_node>
</text_node>
</element_node>
<element_node name="BANKACCT">
<text_node>
<RDB_node>
<table name="ACCOUNT"/>
<column name="BANKACCT"
type="VARCHAR(100)"/>
</RDB_node>
</text_node>
</element_node>
</element_node> <!--end of Account element-->
</element_node> <!-- end of FxTrade element -->
</root_node>
</Xcollection>
</DAD>
|
DAD defines a mapping between XML elements and relational database columns by employing element_node to RDB_node associations. The top-level element_node FXTRADE is defined as a join between tables FXTRADE and ACCOUNT, with field ID in ACCOUNT table as a primary key. Child element CURRENCY1 is mapped onto field CURRENCY1 in the table FXTRADE, and so on. Note that the ACCOUNT element does not include any RDB_node -- this is not required since the link between ACCOUNT and FXTRADE tables has been defined earlier. Child elements of ACCOUNT, BANCCODE, and BANKACCT respectively are defined in the corresponding columns of ACCOUNT table. Atomic XML elements are tagged in DAD as text_node. In the example above all elements, except FXTRADE and ACCOUNT, are atomic.
XML document composition and decomposition are handled by stored
procedures: stored procedure dxxGenXML() extracts XML
documents from a database; stored procedure dxxShredXML()
stores XML documents in a database.
The main input parameters of dxxGenXML() are
Other input parameters specify the maximum number of rows returned, and <RDB_node> <condition> element override options. The output parameters include the actual number of returned rows, return code, and return message.
An extensive example of stored procedures within a C program can be found in "IBM DB2 Universal Database XML Extender Administration and Programming, Version 7".
Putting an XML document into a database is accomplished by
dxxShredXML(), a stored procedure.
The input parameters of dxxShredXML() are
Its output parameters are a return code and return message.
The XML-SQL mapping rules are specified by means of a Data Access Definition (DAD) file, which is an XML document. The DB2 XML Extender administration facilities include a means to construct DAD files for each persistent DTD.
Future enhancements will include a new XML-SQL transformation syntax, which will employ an XML Transforms language, a subset of W3C XSLT.
|
SQL Server's SQL-to-XML and XML-to-SQL mapping rules employ different syntaxes. Mapping details are discussed in the sections on extraction and storing.
Mapping between database columns and XML elements or attributes is
defined by means of AS aliases in a
SELECT:
<database column> AS [Element Name! Nesting Level! Attribute Name! Directive]
The document top level is assigned level 1, as below. By default, column data is mapped onto attribute values. The directive "element" may be used to change this default setting.
The process of generating an XML document from the database has two steps.
Step 1. Create As-aliases to atomic elements in your
desired output XML; the alias defines the parent/child relationships
between elements. The table below shows the aliases for our example
document.
FXTRADE /* LEVEL=1 */
CURRENCY1 [FXTRADE!1!CURRENCY1]
CURRENCY2 [FXTRADE!1!CURRENCY2]
AMOUNT [FXTRADE!1!AMOUNT]
SETTLEMENT [FXTRADE!1!SETTLEMENT]
ACCOUNT /* LEVEL=2 */
BANKCODE [ACCOUNT!2!BANKCODE]
BANKACCT [ACCOUNT!2!BANKACCT]
|
Step 2. Define the output tree structure in SQL. Each level of the
tree is defined through a SELECT statement, thereafter
the levels are combined together into the tree by means of a
UNION ALL statement. The level-1 SELECT
statement introduces the names of atomic elements on all levels. Each
SELECT statement introduces a tree level tag and its
parent tag. There is a single record in the result set corresponding
to the tree root, as defined in the first SELECT
statement below.
SELECT
1 AS Tag,
NULL AS Parent,
NULL AS [FXTRADE!1!CURRENCY1],
NULL AS [FXTRADE!1!CURRENCY2],
NULL AS [FXTRADE!1!AMOUNT],
NULL AS [FXTRADE!1!SETTLEMENT],
NULL AS [ACCOUNT!2!BANKCODE],
NULL AS [ACCOUNT!2!BANKACCT]
FROM
FXTRADE
UNION ALL
SELECT
2,
1,
FXTRADE.CURRENCY1,
FXTRADE.CURRENCY2,
FXTRADE.AMOUNT,
FXTRADE.SETTLEMENT,
ACCOUNT.BANKCODE,
ACCOUNT.BANKACCT
FROM
FXTRADE, ACCOUNT
WHERE
FXTRADE.ACCOUNT = ACCOUNT.ID
ORDER BY [ACCOUNT!2!BANKCODE],
[ACCOUNT!2!BANKACCT]
FOR XML EXPLICIT, ELEMENTS
|
The FOR XML constructs an XML document by analyzing
tags and AS-aliases in the combined row set. The keyword
EXPLICIT selects the most flexible, user-defined mode of
constructing XML document. Another mode, AUTO, constructs
XML document by applying default rules. The keyword
ELEMENTS models SQL columns at element level; otherwise,
the default is that columns are modeled on attribute level.
Storing XML documents employs OPENXML, a new row set
function, similar to table or view. OPENXML can be used
for inserts or updates, or for SELECT INTO target tables.
OPENXML simplified syntax is shown below:
OPENXML (<XML document handler>, <path pattern>, <flags>) WITH (Schema | Table)
The process of storing an XML document has three steps.
Obtain an XML document handler by compiling XML document
into internal DOM representation, using the stored procedure
sp_xml_preparedocument.
Construct a schema by associating schema fields with atomic XML elements.
XML elements are defined by a path pattern (absolute base path) plus a relative element path. Element-centric mapping is indicated by a flags value of 2. Existing tables can be used instead of a schema, with field names equal to XML names.
Remove the compiled XML document from memory by using the
stored procedure sp_xml_removedocument.
These steps are illustrated by the following example.
DECLARE @idoc int
DECLARE @doc varchar(1000)
SET @doc ='
<FXTRADE>
<CURRENCY1>GBP</CURRENCY1>
<CURRENCY2>JPY</CURRENCY2>
<AMOUNT>10000</AMOUNT>
<SETTLEMENT>20010325</SETTLEMENT>
<ACCOUNT>
<BANKCODE>812</BANKCODE>
<BANKACCT>00365888</BANKACCT>
</ACCOUNT>
</FXTRADE>'
-- Create internal DOM representation of the XML document.
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
-- Execute a SELECT statement using OPENXML row set provider.
SELECT *
FROM OPENXML (@idoc, '/FXTRADE/ACCOUNT', 2)
WITH (
CURRENCY1 CHAR (3), '../@CURRENCY1',
CURRENCY2 CHAR (3), '../@CURRENCY2',
AMOUNT NUMERIC (18,2), '../@AMOUNT',
SETTLEMENT DATETIME, '../@SETTLEMENT',
BANKCODE VARCHAR (100), '@BANKCODE',
BANKACCT VARCHAR (100), '@BANKACCT' )
EXEC sp_xml_removedocument @idoc
|
With Microsoft SQL Server 2000, the extraction and storage of XML
documents does not use symmetrical syntax. Extraction extends a
SELECT-clause by use of the FOR XML
construct. Storing introduces a row set function OPENXML,
analogous to a table or view. Extraction mapping rules are based on
(a) introducing tags for specifying tree level and
(b) associating table fields with parent/child relationships between
XML document elements. Storing restructures an XML document into a
flat schema or table; "field-element" association is defined using
XPath notation.
|
Sybase employs an XML document type, ResultSet, to describe both XML document metadata (element name, type, size, etc.) and actual row data. Here is excerpt from a hypothetical FxTradeSet.xml:
<?xml version="1.0"?>
<!DOCTYPE ResultSet SYSTEM "ResultSet.dtd">
<ResultSet>
<ResultSetMetaData>
<ColumnMetaData
...
getColumnLabel="CURRENCY1"
getColumnName="CURRENCY1"
getColumnType="12"
... />
...
</ResultSetMetaData>
<ResultSetData>
<Row>
<Column name="CURRENCY1">GBP</Column>
...
</Row>
</ResultSetData>
</ResultSet>
|
The ResultSet DTD does not appear to permit definition of the nested elements.
The Java class ResultSetXml has a constructor which
takes an SQL query as an argument, thereafter the
getXmlLText method extracts an XML document from result
set:
jcs.xml.resultset.ResultSetXml rsx = new jcs.xml.resultset.ResultSetXml
("Select * from FxTrade", <other parameters>);
FileUtil.string2File ("FxTradeSet.xml", rsx.getXmlText());
|
The ResultSetXml class constructor can also take an
XML document as an argument. Then the method toSqlScript
generates sequences of SQL statements for insert/update into a
specified table from a result set.
String xmlString = FileUtil.file2string ("FxTradeSet.xml");
jcs.xml.resultset.ResultSetXml rsx = new jcs.xml.resultset.ResultSetXml
(xmlString);
String sqlString = rsx.toSqlScript ("FxTrade", <other parameters>)
|
The extraction and storage of XML documents are symmetrical in nature. Storing does not seem to permit modifying more than one table. Extraction transforms the results of an SQL query into a document with a flat structure.
|
Vendor |
Mapping rules |
Single table / Multiple tables |
Means of transformation |
Symmetrical extraction / storing |
|
Oracle |
Implicitly; by constructing object-relational data model |
Multiple |
Designated Java classes |
Symmetrical, if XML document and object-relational model match |
|
IBM |
Data Access Definition file |
Multiple |
Designated stored procedures |
Symmetrical |
|
Microsoft |
SQL extension; row set function |
Multiple for extraction; Single for storing |
By using SQL construct FOR XML and row set OPENXML |
Asymmetrical |
|
Sybase |
Result Set DTD |
Single; query may encompass multiple tables |
By using Java classes |
Symmetrical |
Common features between vendors:
Can the task of XML document storage in a database be split into stages? First, store XML in generic database structures without applying any mapping rules; and, second, applying a framework for subsequent data processing? The advantage of this approach would be to have a universal acquisition agent for virtually any XML document. The remainder of this article proposes just such a solution.
In this strategy, XML documents are saved as normalized tree structures -- raw tree data -- for further application processing. The framework for further data processing is then a set of database objects (stored procedures), similar to DOM APIs, for navigating data by employing traditional SQL processing.
There are advantages to using a framework approach as opposed to full automation: the population of database tables is normally controlled by application logic distributed between "pure" application code, transaction processing, database layer, and stored procedures, not to mention that some tables may reside in remote databases.
Additionally, in a heterogeneous database environment, having a unified tool to perform the same task for different databases has obvious advantages.
The author's solution, ObjectCentric Solutions x-Persistence Utility, implements the strategy discussed here.
Raw tree structures are defined by means of the following tables.
1) Tree Node Definition CREATE TABLE NodeDef ( nodeID udd_id NOT NULL, // unique node id dimID udd_id NOT NULL, // tree category: XML name udd_name NOT NULL, // node name value udd_paramvalue NULL, // node value value_type udd_type NULL, // value type power udd_power NULL, // node tree level isRoot udd_logical NOT NULL, // top node flag descriptor udd_descriptor NULL, // DOM Node type color udd_color NULL // informal data ) 2) Parent-Child relationships between nodes CREATE TABLE NodeLinks ( parentID udd_id NOT NULL, // parent node childID udd_id NOT NULL // child node ) |
Storing an XML document in a database is a matter of invoking the XML2SQL program:
XML2SQL <XML document filename>
Extracting an XML document from a database as a tree structure is implemented using the following stored procedures:
get_tree_roots <tree category> -- extract all
document root ids for a given forest get_tree <root id> -- extract all nodes for a
given root (document)get_children <node, level> -- extract all
children of a certain level for a given node Implementation details:
The x-Persistence Utility is currently used as the core of an XML-based data acquisition agent in Open Finance Laboratory, a customizable integration platform for a range of financial applications such as portfolio management, asset-liability management, risk management, etc. Open Finance Laboratory adopted FpML as a descriptive language for Interest Rate Derivatives products. x-Persistence Utility is currently used as a FpML/FRA data acquisition agent.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.