XML and Databases? Follow Your Nose
Following a recent XML-DEV discussion on how to choose the most appropriate database for your XML application, the XML-Deviant captures the indicators that will help bring you closer to a decision.
One of the most difficult aspects of selecting a technology is getting an unbiased view of the pros and cons. Anyone who has found themselves mired in the marketing gloss applied to many white papers while questing for concrete technical information will be familiar with this problem.
This is where high quality user communities, like XML-DEV, can have their greatest benefit: providing a forum for objective discussion on the relative merits of particular technologies. Armed with facts, and aware of the trade-offs, the plucky developer is better able to make an informed decision. This process can be facilitated if others are willing to summarize these findings and report them to the community.
Faced with the trial of determining the appropriate database technology to use for an XML application, Brian Magick asked XML-DEV whether anyone had attempted to put together a relevant "decision tree". Magick later elaborated on his goal.
I just want to know in a general sense when to XML and when not to in regards to creating database applications or storing data in databases. Why would one want to keep data in a relational format versus converting it into XML to exploit an XML database (without getting into the specifics of any one product). For new applications what are the concerns a developer would need to consider when deciding to go the traditional route versus deciding to use this new "cool" XML technology. One of our concerns is that by adopting an XML database all developers will want to use this cool new tool when in fact in has specific purposes that are not relevant for all new databases/apps..
While no one had yet created such a beast, several XML-DEV members were happy to provide guidance on important and relevant factors. This prompted a lengthy thread with a great deal of useful information. Technology evaluations are hard to do in a vacuum, one must have some idea of the requirements and a decision generally involves a trade-off between different factors. This makes it hard to synthesize these kinds of discussion into a decision tree, which would involve rating one factor above another to build the tree structure (or end up in a complex interlinked mess). In summarizing this week's discussion, therefore, the Deviant has decided to use a different approach: smell.
The extreme programmers among you, and anyone familiar with the code refactoring work of Kent Beck and Martin Fowler, will be aware of the concept of code smells. These are code hints that indicate something is wrong or in need of a change; for example, too many method parameters. Code smells, like design patterns, are a condensation of developer experience. Therefore we might apply a similar technique, "requirement smells", to identify the indicators that will help guide you toward the right decision.
There are some caveats to add here. First, these indicators are those deemed important by the XML-DEV community, there may be other issues to consider, e.g. business or budgetary constraints that may skew things one way or another. Second, some recommendations are based on the current state of the database market; as products develop other considerations might become important and distinguishing factors may disappear. Third, specific products have not been considered. Not all database management systems are created equal so there are sure to be exceptions to every rule. There may be roses amongst the manure, and well-perfumed products that hide something, well, nasty.
If you have any comments to add, then share them through the XML.com comment facility (see below) or post to XML-DEV, where they're likely to get a great deal of peer review.
Sniffing Out a Database
Each of the following sections covers a decision factor, including useful opinions and comments extracted from the recent discussion. Some of these overlap, and some may contradict others. Some are concerned with the type of data you're storing, others with how you intend to manipulate that data; some are specific while others are more general. As for terms of art, Native XML Database (NXD) and XML Enabled Database (XED) are defined in the XML:DB FAQ.
|What does your XML-database integration smell like? Share your experience in our forum.|
|Post your comments|
The document-data distinction is quite prevalent in XML circles. Data-oriented XML describes information such as purchase transactions or phone book entries, i.e. the kinds of information usually only of interest to an application. Conversely, document-oriented XML, including XHTML docs, is usually of interest to users. Ronald Bourret describes the distinction further in his XML and Database paper. Data oriented XML is generally easier to map into a relational database, although the complexity of the DTD is obviously a factor. This makes an RDBMS an obvious candidate for data storage. An XED may offer features to help automate the transformations to and from XML. It's likely that some development effort will be required to define a relational mapping for your schema, particularly if it is non-trivial.
If you have XML "data" that is easily normalized into RDBMS tables, an RDBMS or XML-enabled RDBMS will probably do at least as good a job as a native XML DBMS. (Michael Champion)
I've seen DTDs with hundreds of elements and figuring out a useful mapping from them to the database is distinctly non-trivial, especially when a lot of the elements are wrappers that don't represent real structure in the database. (Ronald Bourret)
If what you're storing is largely documents and not largely data, then an NXD may be the best option. However it will depend on the kinds of questions you need to ask about the data. Some document-oriented schemas have obvious metadata which will make a relational mapping easier. It's difficult to cleanly map mixed content to a relational schema, and document-oriented XML often has a varying structure
If you have XML "documents" with mixed content, recursive content models, a complex mix of elements and attributes, and you want to search on the XML structure *and* content, a native XML DBMS will almost certainly be superior. (Michael Champion)
...mixed content doesn't map well with an object-relational mapping. (I won't go into the details here. If you want to read more about this, see sections 3.3 and 3.4 of [Mapping DTDs to Databases] (Ronald Bourret)
It's much easier to maintain collections of XML documents using a native XML DB than to map those documents into a relational database, or even to store them as blobs. (Tom Bradford)