XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


XML and Databases? Follow Your Nose
by Leigh Dodds | Pages: 1, 2, 3, 4

Varying Structure

Relational databases are particularly good for storing highly structured information, and not particularly good at managing semi-structured data. Document-oriented XML usually has a varying structure to allow for the flexibility inherent in prose. Not all semi-structured data is document-oriented however, e.g., bills of materials. While it is is possible to map semi-structured information into a relational model there can be performance overheads, particularly with querying. An NXD or XED with XML-aware text indexing facilities is a good fit for this kind of information.

Semi-structured data is data that has some structure, but is not rigidly structured. An example of semi-structured data is a health record. For example, one patient might have a list of vaccinations, another might have height and weight, another might have a list of operations they have had. Other examples of semi-structured data are legal documents and genealogical records...

Semi-structured data is difficult to store in a relational database because it means you either have many different tables (which means many joins and slow retrieval time) or a single table with many null columns. Semi-structured data is very easy to store as XML and is a good fit for a native XML database.

(Ronald Bourret)

One of the main virtues of (data-centric) XML's data model is its handling of semi-structured data. Relational DBMS's do not handle semi-structures data very well... (Dan Weinreb)

Rigorous Structure

If you're storing XML which has a schema with a fixed structure (i.e. little or no use of mixed content or recursive content models) then a relational database may be the better solution.

Structured data is very rigid. That is, all records have the same fields. An example of structured data is a telephone book -- every entry has a name, an address, and a telephone number. (Ronald Bourret)

...Relational databases are generally a better way to store highly structured data. Then again, most native XML vendors I've talked to say the same thing, although nobody is quite sure where the line is going to be drawn. (Ronald Bourret)

Obvious Metadata

It is a fairly common design pattern in XML schemas to see a head-body structure, with metadata being collected into the head of a document, and the content in the body. In this situation it's often the metadata in the head which needs to be indexed and queried. This pattern is quite easy to map into a relational database or XED, particularly as the metadata rarely makes significant use of mixed content. If your schema(s) conform to this pattern, then this storage option is a quick solution. XML enabling features can be used if you need to manipulate or query the body content. However if you need to ask particularly complex questions of your document content then a native XML database may be the better option.

If you have "document" XML and the schema is easily separated into easily normalizeable metadata and arbitrarily structured text content, and most queries will be on the metadata, an XML-enabled RDBMS that extracts some elements into tables and other elements into CLOBs may be the most appropriate. If you anticipate frequent queries on the actual document structure and content, a native XML DBMS may be more appropriate. (Michael Champion)

Complex Questions

The ways in which information will be queried is an important factor when considering a database technology. If you have data-oriented XML, an RDBMS is likely to be the better option, similarly for XML documents that have obvious metadata. However once you need to use full-text searching, or manipulate a recursive content model, then a NXD or an XED with an appropriate XML query language looks like a better option.

If you need to do queries that exploit the recursive structure of your data, a native XML DBMS will tend to be more appropriate than a "raw" RDBMS. For example, if you need to know which products contain assemblies or sub-assemblies, or sub-sub-assemblies ... that contain a particular part, XPath-based query languages will handle it better than SQL-based languages. (The "bill of materials" query is a well-known challenge for SQL, but a piece of cake in XPath). I guess to be fair, I'd have to say "if you need to do queries that involve joins across metadata in different collections, an RDBMS-based approach will tend to be more appropriate than a native XML DBMS, at least until some flavor of XQuery is widely supported." (Michael Champion)

...SQL has a far richer set of data manipulation operators than XPath (XQuery addresses this to some extent), and is (loosely) based on a general theory of data, and XML is much more ad hoc...if you don't have relatively complex document-like XML, the *searching* features of a native XML DBMS don't buy you much over storing simple XML in an RDBMS. But if you do have "interesting" XML schema, e.g. with recursive elements, simple XPath queries can address problems that seriously challenge SQL experts. (Michael Champion)

While the most popular query [language] is probably XPath (usually with extensions for multi-document queries), numerous other query languages -- all of them proprietary -- are supported as well. This is true of XML-enabled relational databases as well as native XML databases.

There is good reason for this, as XPath is not rich enough to perform many of the queries needed by users and XQuery is not yet finished. I suspect that when XQuery is done, you will see many implementations of it. (Ronald Bourret)

Shared Repository

It's often the case that you have an existing data repository (with the accompanying investment) whose data you'll be manipulating along with your XML data. In these circumstances you may need to consider the benefits of enterprise integration using an NXD. There may also be data integrity issues. It's also a common requirement to be able to share the data with applications which are not XML-aware; this is often true for data-oriented XML, and commonly arises when one is adding an XML view onto an existing database or repository. In these circumstances an RDBMS or XED may offer the best of both worlds.

For many situations, you can get by with a relational database, especially if the majority of the applications drawing from that database may not need to manipulate or view the data as XML. If that is the situation, XML Middleware, or the mapping facilities that many RDBMSes are providing directly these days should suffice. (Tom Bradford)

Data in an XML-enabled relational database is accessible to both XML and non-XML applications. The same is not true of almost all native XML databases. That is, almost all native XML databases do not (yet) support ways to return the data in non-XML format. (Ronald Bourret)

Another issue is whether non-XML applications can use the data without seeing any XML. The answer is yes for XML that is mapped to tables and no for XML stored in a BLOB. (Ronald Bourret)

If you expose data to both existing RDBMS applications and XML applications, you're probably better off leaving it in an RDBMS; XML simply has no good notion of "referential integrity" and that could bite you hard. (Michael Champion)

Pages: 1, 2, 3, 4

Next Pagearrow