XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Under the Hood: Oracle Berkeley DB XML

May 07, 2008

Storing an XML document in a relational database has its limitations. XML's hierarchical structures of elements and element attributes do not necessarily map well to relational database structures, which is where an embeddable (non-relational) XML database has its advantages over a relational database.

Oracle databases (since Oracle 9i database R2) provide the Oracle XML DB feature to store and query XML documents in an XMLType data-type column, but you still need a DBA to manage the Oracle database. On the other hand, Oracle Berkley DB XML is an embeddable XML database for storing and retrieving XML documents, one which provides efficient querying of XML documents using XQuery. Oracle Berkley DB XML is built on the embeddable Oracle Berkley DB database and inherits all the features of the database. For a comparison of Oracle Berkeley DB and relational database systems refer to A Comparison of Oracle Berkeley DB and Relational Database Management Systems.

Oracle Berkeley DB XML database stores XML data in a container. A container is managed with a XmlManager object. XML documents may be stored in a Oracle Berkeley DB XML database as whole documents or a set of nodes. It's recommended to store whole documents if the documents are relatively small and loading performance is more important than query performance. Storing in nodes is recommended if query performance is more important than loading performance and documents are relatively large. BDB XML supports XQuery 1.0 and XPath 2.0 specifications to query an XML document in the database. XML documents in a BDB XML database may be modified, updated and deleted.

Installing Oracle Berkley DB XML

You need to download Oracle Berkley DB XML 2.3.10 or a later edition. Download the Oracle Berkeley DB XML Windows Installer .msi file. To install Oracle BDB XML, double-click on the dbxml-2.3.10.msi file. The Installer gets started. Click on Next. Accept the Open Source License and click on Next. Select the default installation folder, C:\Program Files\Oracle\Berkeley DB XML 2.3.10, and click on Next. Select the default Berkley DB XML features and click on Next.

Berkeley DB XML requires some environment variables (CLASSPATH and PATH) to be modified. Select the "Set the values in the environment variables" checkbox and click on Next. Click on the Install button to install Berkeley DB XML. Berkeley DB XML 2.3.10 gets installed. Click on the Finish button. The Oracle Berkeley DB XML jar files get added to the CLASSPATH environment variable. The following bin directory gets added to the PATH environment variable.

C:\Program Files\Oracle\Berkeley DB XML 2.3.10\bin

Berkeley DB XML binaries are not available for UNIX/POSIX systems. To build Berkeley DB XML on UNIX/POSIX refer Building Berkeley DB XML for UNIX/POSIX. Berkley DB XML may be accessed from a command line shell or with a Java API. You will learn about each of these methods in the following sections.

Using the Command Shell

The dbxml shell may be started with the command shown below from a command line.

C:/>dbxml

The command shell gets started and the following prompt gets displayed.

dbxml>

Berkley DB XML commands may be specified at the dbxml command prompt. Some of the commonly used commands are discussed in Table 1.

Command Description
append Appends an object of type element, attribute, comment, or processing instruction to node/s selected by a query expression.
createContainer Creates a container. The first argument specifies the container name. The second argument specifies the container type. Default value is 'in', which creates a node storage container with node indexes. Value 'd' creates a Wholedoc container. Value 'n' creates a node storage container. Value 'id' creates a document storage container with node indexes. The third argument specifies if documents are to be validated on insertion.
getDocuments Gets all the documents in the default container, or, if the docName is specified, gets the specified document.
insertAfter Inserts new content after the nodes specified by the query expression. The new content may be element, attribute, comment, or PI. Content type is specified with <objectType> argument and content value is specified with <objectContent> argument.
insertBefore Similar to insertAfter except that the content is inserted before the specified nodes.
openContainer Opens a container and sets the container as the default container. The validate/novalidate option specifies if documents are to be validated on insertion.
print Prints the most recent results to stdout. Results may be printed to a file by specifying path to a file. A specified number of results may be output with the 'n' argument.
putDocument Adds an XML document to the default container. The <namePrefix> argument specifies the document name. The <string> parameter specifies the XML document string. The third argument specifies the content of the second argument. If the value is 'f' the string specifies a file name. If value is 's' (the default), the string is the XML document to be added. If value is 'q', the string specifies an XQuery expression.
query Evaluates an XQuery expression.
removeContainer Removes a container. The .dbxml file corresponding to the container gets deleted.
removeDocument Removes a document from the default container.
removeNodes Removes the nodes specified by the XQuery expression.
renameNodes Renames the nodes specified by the XQuery expression.
run Runs the specified file as a script.
updateNodes Replaces the content of the nodes specified by query expression with new content.
Table 1. Berkeley DB XML Commands

The first thing that you will need to do once the database itself is set up is to create a container to store the XML documents. For instance, to create a container called catalog for storing catalog entries for journals, use the command:

dbxml>createContainer catalog.dbxml

A node storage container with nodes indexed gets created and opened as shown in Figure 1.

Creating a Container
Figure 1. Creating a Container

The BDB XML database is just a .dbxml file created in the directory in which the dbxml command was issued.

Adding XML Documents

Next, you’ll want to add XML documents to the BDB XML database using the putDocument command. For example, to add an XML document that represents a catalog entry to the database, use the following:

dbxml>putDocument catalog1 '<catalog 
title="Oracle Magazine" publisher="Oracle Publishing"> 
 <journal date="March-April 2006"> 
  <article>
   <title>Using Bind Variables</title>
   <author> Steve Muench </author>
  </article>
 </journal>
</catalog>' s

Single quotes are used for any command parameters that span multiple lines or contain spaces. The character 's' used to terminate the command indicates that the XML document is added using a string. The XML document gets added; the output from the command is shown in Figure 2.

Adding a Document
Figure 2. Adding a Document

Similarly add catalog 'catalog2'.

dbxml>putDocument catalog2 '<catalog 
title="Oracle Magazine" publisher="Oracle Publishing"> 
 <journal date="May-June 2006">
  <article> 
   <title>From Application Express to XE</title>
   <author>David A. Kelly </author>
  </article>
 </journal>
</catalog>' s

The XML documents added may be retrieved with the following command.

dbxml>getDocuments

The output from the getDocuments command is as follows.

2  documents found.

The documents retrieved may be output to stdout with the print command.

dbxml>print

Querying XML Documents with XQuery

Once you have created the appropriate documents, you’ll be able to query them using XQuery, an XPath based SQL-like language for XML. XQuery queries retrieve subsets of data similar to SELECT statement in SQL. Each query has two parts; the first part identifies the set of documents to be queried using an XQuery navigation function such as collection() or doc(). The collection() function is used to navigate to a container. For example, container 'container1.dbxml' may be navigated to with the following XQuery expression.

collection("container1.dbxml") 

Multiple containers may be queried using the '|' operator. For example, containers container1.dbxml and container2.dbxml may be queried using the following expression.

collection(("collection1.dbxml")|
("collection2.dbxml"))

The doc() function may be used to navigate to a specified XML document in a container. The doc() function takes a base URI (dbxml: by default), a container name and an XML document name in the XQuery expression. For example, XML document xmldocument1.xml in container container1.dbxml may be navigated to with the following expression.

doc("dbxml:/container1.dbxml/xmldocument1.xml")

As an example of a query select all the article titles in the catalog.dbxml container. If Berkeley DB XML 2.3.8 is used, before you may query an XML document in a container you need to set the base URI.

dbxml> setBaseUri dbxml:/

The XQuery Expression to select all the titles is as follows.

dbxml>query 'collection("catalog.dbxml")/
catalog/journal/article/title/text()'

To output the results of the query use the print command.

dbxml>print

The results of the query are as follows.

Using Bind Variables
From Application Express to XE

As another example retrieve the author of the article with title 'Using Bind Variables'. The XQuery expression for retrieving the author is as follows.

dbxml>query 'collection
("catalog.dbxml")/catalog/journal/article[title=
"Using Bind Variables"]/author/text()'

Output the results with the print command. The output is as follows.

Steve Muench

Modifying XML Documents

The Oracle BDB XML command shell provides various commands such as append, insertAfter and insertBefore to modify an XML document in the database. For example, to append the 'section' attribute to the 'article' elements in the XML documents in the catalog.dbxml container, use the append command:

dbxml>append './catalog/journal/article' 
'attribute' 'section' 'Developer'

The append command in the previous example won't append an attribute if the context has not been set. The query expression for modifying nodes should be relative; it should navigate from the context item (".") rather than evaluating an expression with the collection() or doc() function. The context is set prior to the modifications with the query command. For example, set the context to the catalog.dbxml container. You will use the catalog1 container context for the subsequent modification examples.

dbxml> query 'collection("catalog.dbxml")'

Having set the context you may run an append command. As an example add after the article node in catalog1 XML document another article node. The dbxml shell command to append an article node is as follows.

dbxml> append  './catalog/journal[article/title=
"Using Bind Variables"]'  'element' 'article' 
'<title>Commanding ASM</title>
<author>Arup Nanda</author>'

Output the modified catalog1 document with the getDocuments command.

dbxml>getDocuments catalog1

The output from the print command is as follows:

<catalog title="Oracle Magazine" 
publisher="Oracle Publishing"> 
 <journal date="March-April 2006"> 
  <article>
   <title>Using Bind Variables </title>
   <author> Steve Muench </author>
  </article>
  <article> 
   <title> Commanding ASM</title>
   <author> Arup Nanda</author>
  </article>
 </journal>
</catalog>

Replacing/Deleting XML Documents

Using the Oracle BDB XML command shell, nodes in an XML document in the database may be replaced, renamed, and removed. For example replace the article title 'Using Bind Variables' with 'Introduction to Bind Variables'. The shell command to replace the title is as follows.

dbxml>updateNodes  
'./catalog/journal/article[title=
"Using Bind Variables"]/title'   
'Introduction to Bind Variables'

As another example, rename the journal nodes to magazine. The shell command to rename journal nodes to magazine is as follows.

dbxml>renameNodes  './catalog/journal'   
'magazine'

Next, remove the magazine node with date attribute March-April 2006. The shell command to remove the magazine node is as follows.

dbxml>removeNodes './catalog/magazine
[@date="March-April 2006"]'

An XML document may be deleted from the database with the removeDocument command. For example, remove the catalog1 document.

dbxml>removeDocument catalog1

A database container may be removed with the removeContainer command. For example, remove the catalog.dbxml container.

dbxml>removeContainer catalog.dbxml

Using the Berkley DB XML API

In BDB XML documents are stored in containers. Containers are of two types; Wholedoc container and Node container. A Wholedoc container stores the complete XML document without any modifications to line breaks or whitespaces. In a Node container XML documents are stored as nodes. Each element in an XML document is stored as a node in the database along with the element attributes, attribute values and text nodes. BDB XML also stores information about reassembling an XML document from the nodes stored in the database. The Node container is the preferred type and is the default type. A comparison of Node container and Wholedoc container is discussed in Table 2.

Container Type Storage Mode Query Performance Load Performance Application
Node Container XML document stored in nodes. Faster to query Lower load performance. Use Node container if faster query performance is required. Use Node container if document size is more than 1MB.
Wholedoc Container Whole XML document stored. Lower query performance, because complete document has to be navigated. Faster document loading, because an XML document does not have to be deconstructed into nodes. Use Wholedoc container if load performance is more important than query performance. Use Wholedoc container if document is relatively small and requires to be frequently retrieved.
Table 2. A Comparison of Containers

A container is represented with class com.sleepycat.dbxml.XmlContainer. Configuration settings for a container are specified in the XmlContainerConfig object. An XmlManager object is used to manage many of the objects used in a BDB XML application, including managing a XmlContainer and preparing and running XQuery queries. XmlManager is the main class in a BDB XML application. In this section you will create a container, add XML documents to the container, query the XML documents with XQuery, modify the XML documents, and update the XML documents, all with the BDB XML API in the com.sleepycat.dbxml package. First, create a XmlManager object and set the default container type to be Node container.

XmlManager xmlManager = new XmlManager();
xmlManager.setDefaultContainerType
(XmlContainer.NodeContainer);

Next, create a container, catalog.dbxml. The container is the BDB XML database.

XmlContainer xmlContainer = 
xmlManager.createContainer("catalog.dbxml");

Pages: 1, 2

Next Pagearrow







close