Sign In/My Account | View Cart  
advertisement

Article:
 Going Native: Making the Case for XML Databases
Subject: XML data model vs. relational model
Date: 2005-04-01 02:41:31
From: gregoryjorgensen

From the article:


"While most real-world uses of native XML databases do not fit cleanly into any single category, it is possible to characterize them in terms of a limited number of use cases. The most popular of these are storing and querying document-centric XML, integrating data, and storing and querying semi-structured data. Native XML databases are used in these cases because the data involved does not easily fit the relational data model, while it does fit the XML data model."


What is the XML data model? Hierachy expressed with markup? Does the XML model have any of the mathematical foundations of the relational model? On what basis can the XML and relational models be compared?


I've never seen an example of structured data that doesn't fit the relational model but does fit the XML model. Can you give an example and explain why it doesn't fit the relational model?


Greg Jorgensen
PDXperts LLC
Portland, Oregon USA


No Previous Message Previous Message   Next Message Next Message


Titles Only Titles Only Oldest First
  • XML data model vs. relational model
    2005-04-01 06:33:34 mchampion [Reply]

    As I understand it, E. F. Codd demonstrated conclusively that any data schema can be normalized into the relational model. Thus I believe it is logically impossible to show an "example of structured data that doesn't fit the relational model but does fit the XML model".


    The use case issue comes down to pragmatics, not theoretics: XML and XDBs tends to be a more *practical* solution for data that irregular, deeply hierarchical, and recursive ... like an awful lot of real world documents, or data about real-world hierarchies (organizations, component-subcomponent assemblies, etc.). One can certainly build a pure relational model of these things, and might possibly be able to build a relatively portable SQL implementation, but one hits all sorts of practical limitations - multiway joins are usually required (which gets pretty slow in practice with more than a few tables), queries get complex, and only a few geniuses can understand how it all fits togther.


    XML and XML-capable DBs allow ordinary mortals who've "got XML" to work with this sort of data in a standardized, portable, and reasonably efficient way. The theoretical links back to the relational model are being forged, albeit slowly. A couple data points for that: Don Chamberlin, one of the inventors of SQL, is editor of the XQuery spec; and some of the set theory that Codd cited in his original work has been shown to handle XML (see xsp.xegesis.org for a collection of papers).

  • XML data model vs. relational model
    2005-04-01 22:55:46 rpbourret [Reply]

    There are actually a number of XML data models. For example, DOM, SAX, XPath 1.0, XQuery 1.0 / XPath 2.0, and the Infoset all explicitly or implicity define data models. What these all have in common is that they are ordered trees of nodes (elements, attributes, text, etc.) and scalar values. (The XQuery data model is actually an ordered forest.)


    Although it's not quite technically correct, it's easiest to think of the XML data model as a DOM tree. No markup in sight -- just an ordered hierarchy of nodes, with values at the leaves.


    There is no single XML data model used by all native XML databases. This is because most native XML databases predate XQuery and some even predate XPath. Consequently, many invented their own query languages and corresponding data models. Currently, the most common query language in native XML databases is XPath (with extensions for multi-document queries). In the future it will be XQuery. For information about the XQuery / XPath 2.0 data model, see http://www.w3.org/TR/xpath-datamodel/


    As Michael noted, there isn't anything you can model with the XML data model but not the relational data model. In fact, there is a generic mapping from DTDs to relational schemas (see http://www.xml.com/lpt/a/2001/05/09/dtdtodbs.html). The only problem will occur with schemaless XML, and you can always map that to a generic set of tables (Elements, Attributes, etc.)


    So to quote Michael again, the problem is practical, not theoretical. For example, if you map the XHTML schema to a relational schema, the p element alone is mapped to 36 different tables. As you can imagine, reconstructing an XHTML document requires an impractical number of joins.

    • XML data model vs. relational model
      2005-04-22 21:22:24 gtnicol [Reply]

      >So to quote Michael again, the problem is
      >practical, not theoretical. For example, if you map
      >the XHTML schema to a relational schema, the p >element alone is mapped to 36 different tables. As
      >you can imagine, reconstructing an XHTML document >requires an impractical number of joins.


      This is not strictly true... there are a number of ways of mapping XML onto a relational database that don't require joins... especially given that most real-world documents are not as deeply nested as people might think (typically 3-10 levels).


      Speaking as someone that has used XML/SGML databases under CMS systems for years (we have 2 XML databases), one of the main reasons I've seen for it is that real-world documents tend to be somewhat variable, even in very tightly constrained environments where DTD/Schema validation is required (such as the military). XML databases, for the most part tend to handle such variability well (especially those that are schema-independent in their storage). Typical RDBMS-based systems tend to be much more rigorous, and hence, less flexible (though that is not always the case).


      The other area where, for documents, XML helps, is with the combination of fulltext and structural queries. Given those capabilities, you can build fairly sophisticated hypertext applications, and also enable fine-grained reuse of content in ways that are difficult in most RDBMS-based systems. The tradeoff here is much like the tradeoff between tightly and loosely coupled distributed systems.


      In fact, looking at the list of deployments listed, I would say that for all of these, reuse and flexibility would be two of the reasons people used XML. I bet other factors, such as management paradigm, and ease of integration played a significant role as well.


  • XML data model vs. relational model
    2005-04-07 16:53:04 Mike Trotman [Reply]

    I think the key difference between the relational and XML database models - which the article does point out - is that the native XML model stores / makes available information about the structure of the data - as well as just the data - through the same interface.


    This is what I have found to be the most important distinction and consideration when deciding to use XML for data representation - your data structure is 'naturally' also data of the same kind.


    An analogy is the difference between a functional programming language like LISP that can operate on itself as part of the design vs. a procedural one like C that either cannot - or requires tortuous code / hacking to achieve the same effect.


    This often means that queries, processing etc. can be easily made much more generic / re-useable as they can have a large independence from any specific structure / syntax. And this also makes it much easier to mix information from many different structures.


    So - it's less a question of whether the data in the structure can be mapped between the two cases, more a question of what you can do with the data structures.


Sponsored By: