Menu

The XQuery Chimera Takes Center Stage

January 3, 2007

Simon St. Laurent

For the first time in many years, I left an XML conference thinking that XML might actually finally change the Web significantly -- and soon.

XML still isn't likely to change the Web much on the client side, beyond the role it plays in Ajax and related technologies. (Even that role is likely to be reduced by JSON.) The dreams of XML hypertext are dead, or at least thoroughly dormant.

The changes I saw at XML 2006 that are driving XML deeper into the Web seem likely, for now, to operate mostly on the server side, as XQuery both brings XML databases to a wider audience and combines access to relational data and XML.

The Target

XML has never worked neatly with the heart of most web applications' architecture, the relational database. XML's hierarchical structures map poorly to relational database structures. You can, of course, create table- and record-like documents that fit easily with relational databases, but that's a fairly tiny if important subset of XML possibilities and documents.

Web applications built on relational databases can and do use XML, of course. Applications routinely generate XML from query results, and import XML documents by shredding them into pieces spread across tables. The more complicated the document, the more likely that multiple tables will be involved, or that it will prove easier to store the XML as a BLOB or a separate file. (There was discussion at XML 2006 of XML files that routinely get shredded across several hundred tables, with results that often make little sense when only a few of those tables are examined.)

Relational databases aren't likely to go away any time soon, however. They're far too good at storing structured data, scale better than the alternatives, and offer much more flexibility than most people know what to do with. XQuery can work with them; it just offers new options, making it easier to optimize among relational databases for structured data and other kinds of data storage for more loosely structured hierarchical data.

Middleware Possibilities

XQuery has pretty much always been about more than XML. For years, vendors have shown diagrams where XQuery provided a central cloud connecting all kinds of relational databases and XML databases -- and whatever else might be lying around -- into a single lovely XML stream.

Connections with relational databases have been a key justification for XQuery's support of the W3C XML Schema type system and its heavily typed processing model. XQuery isn't meant to replace SQL, but it can certainly complement it, especially when relational databases are already supporting reporting results as XML.

I suspect that there will be something of a culture shock as developers accustomed to SQL look at XQuery and XPath, but mixing and matching SQL for efficient queries on tables with XQuery for presenting and combining results seems to be an increasingly popular option. (There are lots of ways to mix XML and relational databases, of course, including SQL/XML, various proprietary extensions and tools, and the ever-popular "store XML fragments or documents as text or BLOBs.")

At XML 2006, the DataDirect folks and the traditional database vendors were definitely interested in this angle, and Roger Bamford's keynote suggested that XQuery could reduce the number of tiers needed to build a distributed application.

Document Management Possibilities

As more and more applications take advantage of XML to send documents from place to place, there's a growing appeal to keeping these documents in their original form, or something close to it. Add to this the organizations whose use of XML has resulted in a gigantic stack of potentially reusable documents, and there are a lot of people who'd like to put XML documents into a system and get them -- or parts of them -- back out.

After ten years of XML, a lot of organizations have enough XML documents to make keeping track of them a challenge, but a challenge with real benefits. Sometimes those benefits revolve around republishing the same information in different forms, aggregating statistical information, or building new uses for information, but other times just having quick access through structured search is itself a huge benefit.

Jason Hunter's talk on Web Publishing 2.0 drew oohs and aahs from the crowd as he showed off ways to use this collected information to create new products and re-energize old ones. Darrin McBeath's keynote included stories of how publishers were using this. In a refreshing change, he had actually asked the various publishing groups within Elsevier about how they use XML and how XQuery fits into that (mostly).

But isn't XQuery late, bloated, complicated, confusing, and missing parts?

Well, yes -- in my opinion. That doesn't seem to be stopping XQuery, however, now that it's finally reaching Recommendation. Unlike other XML specifications, this one seems to coming to fruition at just about the time people are realizing they need it. That doesn't mean they'll use it immediately, of course; after eight years, XML is still percolating into business systems, and it's a relatively small and simple thing.

Until now, XQuery has been a chimera, a mythical creature made up of multiple parts. While XQuery still feels like it's made of multiple pieces, it's approaching the end of its mythical period, and those multiple pieces should be useful for various projects. (Update functionality would help, of course, but right now that seems to be further behind with vendor-specific implementations.)

The chimera is finally ready to fly, and judging by the response at XML 2006, there are a lot of people who are ready to take it for a spin. The early adopters were showing off a lot of real work, which suggests that XQuery has grown beyond experimentation and gets deployed in a growing number of production environments.

And What Has This to Do with the Web?

XQuery itself isn't about the Web -- it's about collecting information from various sources. However, it also provides templating facilities like those of XSLT, and is perfectly capable of generating XML or HTML.

Where traditional scripting languages have split querying from the application and presentation logic, XQuery lets developers combine the query with the result generation. I don't expect to see hordes of PHP or Java developers discarding their tools in favor of XQuery. I do expect, however, that as developers start using some XQuery, they'll push more and more of the work into that layer, and look for extensions to make it seem more familiar. Eventually Roger Bamford's vision of a reduced number of layers may well come to pass and change the way we build the Web along the way.

This kind of XQuery use has another side benefit: cleaner XML than that produced by a lot of current scripts. XML well-formedness is a natural side-effect of using XQuery, and even with the mixing of presentation and query layers, converting XQuery that generates HTML to XQuery that generates XML is not particularly difficult. Perhaps this will accelerate the shift toward making data available without an HTML wrapper.

XQuery's clean XML and XML-centric approach also offers the opportunity for developers to make the mental shift from thinking of XML as a serialization format to thinking of XML as something with its own features and benefits. (If you keep those serialized objects around, you eventually have a chance to look at them differently.) While the benefits of this are probably not very clear except perhaps to the XML-obsessed, I have hopes that more people spending time working with XQuery and XPath will yield at least the same understanding of XML that working with SQL produces for relational databases. That may not necessarily be a great understanding, but it would still be much more than many people have today.

Perhaps the most important feature XQuery has to offer is a further blurring of many kinds of information, as XML's capabilities for representing all kinds of structured data are applied to an ever-larger pile of information. XQuery can mix relational databases with spreadsheets with whitepapers with SVG graphics with... well, pretty much anything you want, even RDF to a limited degree. Sound, raster images, and video are still mostly separate categories, but XQuery can now mix and mash everything else.

That blurring goes beyond XML formats. XML 2006 had separate tracks for Web, Publishing, and the Enterprise. XML and XQuery seem likely to drive the Web and Publishing closer together as the different forms of distributing content become a secondary detail, not something that separates industries. That's something we'll be watching closely at O'Reilly, as well as at XML.com.