
Against the Grain
This week the Deviant summarizes some of the comments made by XML-DEV members in response to a recent critical article on the relationship between XML and databases.
Poor Relations
It's been an interesting week on XML-DEV with one particular topic being hotly debated. Namespaces? Relative URIs? No. The debate revolved around a single question: "What is the correct plural form of 'schema'? Is it 'schemas' or 'schemata'?". This important topic involved many learned postings discussing the minutiae of the plural forms of endless Latin and Greek words. Not qualified to make a judgment on this vital question, not knowing one end of a neuter plural perfect passive participle from another, the Deviant this week takes a look at another debate entirely.
Ken North shared a pointer to a recent article by Fabian Pascal which attacked XML as a means of describing data, XML databases, and indeed pretty much anything (Java, object relational databases, even SQL) other than a pure relational model housed in a central database. Not surprisingly, this prompted some feedback from XML-DEV members.
As Jonathan Robie noted, Fabian appears to argue himself in circles:
..Fabian makes an argument that should lead to the conclusion that XML databases are an important thing to pursue - his central claim is that XML needs a database behind it!
Michael Champion was prompted to wonder why, if the relational model is so perfect, DBMS vendors are adding "post-relational" features to their products, and whether there's a sweet spot between the mathematical rigor of the relational model and the flexibility of XML.
My biggest question after reading his stuff is "If the pure relational model is so powerful, why have the RDBMS vendors, presumably driven by customer demand, supported 'post-relational' Object-Relational and XML features in their recent releases?" I personally doubt if "ignorance" is the answer.
I keep hoping that there is some middle ground where the rigorous mathematics of the relational model and the pragmatic usability of XML can meet and inform one another. In private correspondence, Mr. Pascal assured me that a truly mathematical model of XML is impossible, but I'm keeping an open mind.
Presumably these features are being added because customers are keen to use their data in different ways; for example, in closer conjunction with business objects or to store different kinds of data that don't fit cleanly into a relational system. Documents are an obvious example, and the Web is a gold mine of semi-structured data just begging to be usefully manipulated. Much of the XML database and query work is geared toward exploiting this information. And as Joshua Allen observed, while relational databases have been steadily optimized for many years, research on semi-structured data is only now becoming mainstream.
The only reason that RDBMS software dominates the market right now is because we are good at solving these problems, and RDBMS design has evolved to disallow users from asking questions that the database isn't good at answering. The fact that we ship databases that only permit things that we know how to answer efficiently does NOT imply that we will never be able to answer other questions more efficiently (in fact, RDBMS systems have evolved and gobbled up much of the research on data warehousing to include those techniques into the engines -- witness materialized views and bitmapped indexes). It is quite easy to see a trend in the industry that shows consistent continual progress at solving hard query problems. Of course some problems will always be hard (distributed cost-based query optimization is one), but I would point out that research on RDBMS optimizations has tapered off quite a bit and we have seen major increases in research geared toward semi-structured data in the past decade. So we are simply easing off on some of the traditional RDBMS constraints and beginning to allow things like recursive self-joins, ragged hierarchies, etc. and we are optimizing these things.
Allen also seemed certain that the mathematics of graph theory, the underpinnings of semi-structured (hence XML) data, would bring dividends.
...I think that areas of discrete mathematics that deal with graphs are currently the most vibrant area of research in the industry. The web itself is one huge graph structure, and research on ways to index the web, optimize routing, etc. all feed directly into techniques for optimizing XML processing....
Indeed one may find it hard to criticize the current XML Query efforts, which are defining the algebraic underpinning for querying XML data sources. If this formal work were not being carried out, Mr Pascal's claims might make more sense. How else will advances happen if the basic research is not carried out? In a later message Joshua Allen painted an interesting picture of "the honkin' graph" that is the Internet.
...The web is a graph. XML is the web made just a bit less sloppy, but we still have key/keyref and XLink, XPointer, RDF -- all that stuff John mentions. Take the graph that is the web and make it more machine-readable. Take all of the services and data in silos at the edges of the web and expose it as XML documents (as appropriate of course). Now you have one big huge honkin' graph. What is more fun that that?
It's hard to reconcile this image with Fabian Pascal's vision of a centralized DBMS.
Snake Oil?
|
| |
Not everyone was happy with the current state of XML databases; indeed one contributor called them "snake oil". Yet this perception seems to be more the fault of hype and marketing than any technological shortcoming. There is still a great deal of work to be done. Unfortunately it appears that demand is out-stripping supply. Speaking again to this topic, Joshua Allen predicted good things from native XML databases, but recommended fully understanding your requirements before committing to any single product.
...there are good reasons to use a pure native approach for XML. The "native XML" people will be able to show you blazingly fast queries over massive data stores that would make an RDBMS croak. The "XML-adaptor" people will show you queries against *their* XML that run blazingly fast but make a "native" engine croak. The moral is that there is no "one true way" at this point, and both models will converge. I think it's a teeny bit unfair to call XML databases "snake oil"; instead think of 2001-XMLDB as 1980-SQL. As "native" databases evolve to support traditional relational-type stuff better and relational-XML adapters evolve to support things that native implementations excel at, the distinction will become irrelevant and the code bases will be pretty much the same. In the meantime, using XML databases means having a good understanding of your use cases, needs, etc. and evaluating each product individually.
Nichola Lehuen agreed that understanding your application's data and choosing the appropriate modeling technology would bring benefits, although Lehuen was ultimately less effusive about potential benefits.
|
Also in XML-Deviant | |
...for any given data to model, you can find a hierarchical (e.g. XML) representation, a network representation (the node-labeled graph model), a relational representation, an object representation, or more exotic representations (e.g. the Caché model). But depending on your data, one of these models will rise out as the "best" one, in terms of ease of implementation and of efficiency in queries and updates.
So I believe there is a whole set of problems that will benefit from XML databases...The storage, indexation and querying of a set of document-oriented data is a good example.
But XML databases isn't or (won't) be a revolution, blasting all other storage models. We could even say that the XML database model is just a come back of the hierarchical model that was supposedly "killed" by the relational model back in the 80s. I don't think XML databases are the "next thing".
Other messages in this thread picked up on the data modeling issue. Jeff Lowery observed that at the moment constructing an efficient relational model for an XML structure is more art than science.
I think object-relational databases have some promise. Knowing how to decompose an XML hierarchy just enough to result in an efficient relation model is more of an art than a science right now: I don't think you get much benefit if all parent-child relations are rigorously broken down into primary/FK pairs, for instance. Knowing how the data will be fetched is the main design criteria of an object-relational model, with performance gains for 'fixed' fetches coming at the cost of degrading some ad-hoc queries (adding an XPath-based index for complex XML elements stored in columns might speed up finding, but not fetching).
Extending the capabilities of database management systems to facilitate the move from art to science can only be a good thing. At least, it is difficult to see how it could be a bad thing. It also seems obvious that building this work using formal models is smart, which exactly what the XML Query work is doing. Only this time the model and the query syntax are being developed hand in hand. Unlike relational theory and SQL we should hopefully have a standard XML query language very shortly. One might also hope that this would limit mismatches between the two, which seems to be the case for pure relational models and those expressed by SQL. Promisingly, just this week two early implementations have been announced which means developers can finally begin to come to grips with this new technology.
How do you use XML with databases? Share your experience in our forum.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- XML and RDBMS
2001-10-08 08:20:14 shailesh deshpande [Reply]
I am doing one project using XML and ASP!
In that I am fasing some problems! Please help me out in solving that!
1) when I write HTML tags in ASP code some illigle characterator like yp comes in the output of iexplorer! Also in Netscape no output comes! I require to wirte HTML in ASP not in stylesheet as it can not be opened in FRONTPAGE if I require modifications!
What should I do if I want to write HTML in ASP and output in Netscape with out any error!
2) In my project I am currently using SQL as a backend! I want now XML! I have used JOINS to retrive data from multiple tables! Here in XML hw can I retrive data from multiple tables if they are related! Guide whether to use sing document or to use multiple XML documents for specially retrival in many-to many relationship!
The help provided using xpointer and ID an IDREF is insufficient to me! Please guide me in detail as early as possible!
Please send reply to:
shail_arya@indiatimes.com
Thanks!
- data scope
2001-07-15 04:32:31 jim fuller [Reply]
long term requirements are different to short term
, in the past 5-10 years databases have emerged from middleware to being in front of users eyeballs ( re EXCEL and ACCESS ), this has been a good and bad thing.
i say submerge the db back into middleware, and to a certain degree most non-enterprise db requirements should reuse OS db functionality, something cheap and cheerful.
in fact i am fairly certain that 80% of current small to medium scale database activities could easily be dealt with with a simple text file ( xml anyone.... ).
indirectly quoting S. Meunch from XSLT uk conference, ' use rdbms sql to get smallest slice of required data, then use xslt to manipulate from there'.
these statements all go out the window when talking about large scale implementations,which affects the top 1000-2000 companies worldwide, they will tend to build their own using larger application frameworks, such as oracle, etc.
it makes sense in actual use, but unfortunately doesnt feel clean, and requires the use of more tools.
i would expect convergance ( of xml and rdbms ) to occur over time, with xml repository being the name of choice.
jim fuller
- use both native xml and relational db's
2001-07-14 01:43:06 tim teebken [Reply]
I work at Microsoft, and tend to hear a lot of these descriptions at work. Mostly there are the xml purists on one side, and the rdbms fans on the other. It's already quite easy to integrate the two, if you think about the possibilities. First you could store xml document in MS SQL 2000 Server as big "BLOBS" OF xml, then output the blobs as needed on basis of sql or xpath queries. On, say, a web server, transform the xml document to appropriate xhtml output,using xsl, and this can be handled by recent browsers.
This doesn't seem too hard, and seems to allow you some of the powerful advantages of both. On another thought, if you want to integrate data from almost any disparate systems, it may be the easiest to output the data to XML, or to a form that another transform could convert to XML, and again with SQL 2000, you can pass valid xml in directly to a table.
- Enabling Constraints
2001-07-06 08:24:39 Stephen Seymour [Reply]
XML Aggregation is a simple mechanical task when there is a constraint of hierarchical uniqueness applied to all of elements in the XML messages. I have found that accepting this constraint to be enabling and suspect that in the database world the same will hold true.
