Does XML Query Reinvent the Wheel?

February 28, 2001

Leigh Dodds

Debates on the XML-DEV and XSL mailing lists over the last two weeks concern the futures of XSLT, XPath, and, the latest addition to the W3C XML toolkit, XML Query. There are no signs of these debates ending this week. Discussion on XML-DEV about the design of XML Query rages on.

Reinventing the Wheel

The focus of last week's XML-Deviant was the concern expressed by several XML-DEV contributors that the interdependence of several W3C specifications may have exceeded the dictates of software reuse and become instead a tangled mess. Suggestions were floated for a refactoring of several standards in order to separate the component parts.

This debate has focused on XML Query in particular this week, following Evan Lenz's claim that the overlap between XML Query and XSLT is so great that they are not really separate languages.

After reviewing the XQuery spec, I'm concluding that the overlap between XQuery and XSLT is far too great for the W3C to reasonably recommend them both as separate languages. If XSLT (or XSLT 2.0) isn't considered adequate as an XML query language by itself, then the development of an XML query language should still build from the same semantic and syntactic base as XSLT.

Lenz has fully documented his opinion in a paper he'll be presenting at the XSLT-UK conference in April. The most obvious overlap between XML Query and XSLT is their shared use of XPath. Indeed, the XML Query and XSLT Working Groups are coordinating on the development of XPath 2.0. XPointer took a similar approach, layering itself on XPath 1.0. At first glance this seems a reasonable approach.

However, Lenz believes that the overlappings go deeper than sharing XPath.

The "navigation part" is only a small part of the overlap. The result construction mechanisms, the flow control mechanisms, the variable binding mechanisms -- these are all virtually indistinguishable (other than syntax) from XSLT's mechanisms for doing the same. I demonstrate all of this in my paper.

The introduction of datatypes is making its way not only into XQuery but the XPath 2.0 and XSLT 2.0 requirements. Regardless of whether datatypes are only part of query or are part of both query and transformations, there should be a common semantic and syntactic core for XSLT and XQuery, rather than an invention of an entirely new syntax.

Lenz characterized XML Query as a subset of XSLT (no template rules, no abbreviated XPath axes) with the addition of data typing, and he claimed that this should be the model upon which XML Query is developed. Noting concerns over the optimizability of XSLT, Lenz pointed out that the XSLT 2.0 requirements refer to an XPath subset that could be used to develop XML Query.

If a subset of XPath or XSLT is necessary in order for query implementations to work, that's fine; define the subset....

Section 2.11 of the XSLT 2.0 Requirements WD states that XSLT 2.0 "Could Improve Efficiency of Transformations on Large Documents," specifically addressing the possibility of defining a subset of XPath "that does not require random access to the source tree." This would basically be XQL/XQuery's path language (without perhaps the requirement that expressions be abbreviated). I just discovered this section, and I think *it* should be the germ of the W3C XML Query Language, fully informed by the work of the XML Query WG on the query algebra. I feel quite strongly that there is a better way than to define an entirely new syntax and semantics, when so much of what is needed is already found in XSLT or is slated for a future version of XSLT.

Other contributors supported Lenz's position. Uche Ogbuji, who has been openly critical of the direction of XPath 2.0 and XSLT 2.0 developments, claims XML Query could be derived from existing specifications.

All I think would be needed in XQuery is a formalization of document collections and some unification of data models across the various facets of XML (XPath, DOM, SAX, XLink [note linkbases]), etc., where query proves useful. And finally a few primitives based on XSLT extensions for specialized and efficient query calculus.

XQuery could be a very small and very familiar spec.

Charles Reitzel doubts the need for XML Query as currently defined.

I see XML query processing (lower case q) as occurring in four stages:

  1. Identify the universe of documents that will be searched.
  2. Apply query conditions to include/exclude each
  3. Add specified fragments of each document to intermediate results.
  4. Present output.

XSLT does a good job on 2)-4) and an adequate job on 1). Without some more powerful notion of locating documents -- within a larger file system directory structure, meta-data query result set or other means -- I don't see the value add for XML Query.

David Rosenborg expressed similar concerns, suggesting that an evolutionary approach would be more productive.

I think that there's definitely a need for XQuery, however, it's problem domain is not peculiar enough to justify a completely new syntax...

It would have been much better if XQuery had taken an evolutionary path rather than restarting from scratch. My criticism is partial here though, since much has been borrowed from XPath which is good. By evolutionary I mean this: XSLT and XQuery share a common pattern:

  1. select nodes
  2. generate the result tree

In XSLT the first is handled by XPath and the second by template instructions. An evolutionary approach had been to extend, and subset (for optimization purposes) each of these to get a comprehensive and yet optimizable solution. As I said, XQuery does do this with XPath (modulo namespace handling and case-insensitive keywords), but for the generative part nothing is reused. Strange.

Less certain of the utility of applying XSLT, in its current incarnation, to querying, Kimbro Staken observed that using it as a basis for the XML Query development would leverage existing XSLT effort.

Clearly XSLT as it stands now is not fully sufficient to act as a query language. It seems though, that the changes required would be minor compared to having to deal with an entirely new language or even worse two new languages once the XQuery XML mapping is added. I certainly don't see current XSLT implementations suddenly becoming query engines, performance just isn't good enough. However, a specialized XSLT engine that implements a slightly modified spec and has an optimizer to utilize indexes might be a much simpler way to go. Regardless it would certainly be able to leverage the vast majority of the work already put into XSLT vs. starting from scratch on XQuery implementations. The more I think about it the more I find it a compelling idea but I just have to wonder if it can be made to perform well enough.

Defending XML Query

Everyone's assessment wasn't critical. Jonathan Robie, a member of the XML Query Working Group and co-editor of the specification, defended the current draft.

The phrase "reinventing the wheel" usually refers to reinventing something that already exists because you don't know about it. The editors of XQuery include a former member of the XSL Working Group who has written a fair number of stylesheets. They also include one of the inventors of SQL, one of the inventors of XML-QL, and one of the inventors of XQL, a precursor of XPath. We considered quite a few syntax approaches, including building on XSLT, before arriving at the approach we used.

Robie outlined the main influences on the current design of XML Query: the use cases defined in the requirements, optimizability, and strong data typing. Robie claims that the two specifications could coexist peaceably, comparing the current debate to a similar confusion over the relative merits of XSLT and DOM.

Nobody is throwing XSLT out. XSLT will live a long and happy life. So will XQuery. They were designed for different purposes.

Remember a few years ago when people were saying that the functionality of XSLT had too much overlap with the functionality of the DOM? Some were even calling for the W3C to stop work on XSLT. That would have been a stupid mistake, just as stupid as calling for XQuery to stop because it has overlap with XSLT.

Robie's position is that, while there is overlap between the specifications, it's complete overlap. Trying to amalgamate them would cause problems. And so XPath is the only shared component. The ability to compose queries was called a as desirable feature of XML Query; in SQL terms query compositions are nested queries.

XSLT uses one language for selecting nodes, and a different language for generating the result tree. XQuery uses one language that can do both. This allows XQuery expressions to be more compositional than is possible in XSLT.

Robie also compared the current discussion to the XSL-CSS discussion and, further, to a long history of technology debates.

...[W]ould you say that the W3C should not have supported both CSS and XSL? What differences do you see between that situation and this one?

I think it is a given that some people will never see a reason for XQuery, since XSLT exists. There is a sizable community of people who do see a reason for XQuery. I remember when people were really confused about why XSLT was needed. I remember long discussions in the SGML community about whether there was a need for XML. Object orientation was invented in 1967, but object orientation was not mainstream until the late 1980s. New technologies take time to understand and appreciate. Of course, hogwash takes time to recognize and discard.

Eric van der Vlist was sympathetic to XML Query. In his view if the new language is better fitted for some uses, then everyone may benefit.

If it happens that the new language is a better fit than XSLT to do some of the tasks I have to do it will save my time. If not I'll continue using XSLT. Where is the problem with trying something else?

Some competition between may be a good motivation and I would rather regret that XSLT has been lacking credible competitors and alternatives for such a long time.

I see competition as simulating a source of diversity and richness (if it's true for schemas, why not for XSLT?).

Throughout the debate the issue of optimization arose several times. Michael Rys was among those commenting that XML Query better lends itself to optimization than the rule-based approach of XSLT.

There has been over 20 years of research and product development in the area of both (relational) database systems and rule-based systems. Advances have been made in both areas. However rule-based systems have not achieved the level of performance and scalability of query-based systems, neither in the research area (as far as I know) nor in the commercial world.

Michael Champion also supported XML Query. He suggests that it's a useful tool that combines features of other languages.

I see XQuery as a single language within which to do all of what XPath does, some of what XSLT does, some of what SQL does, with some scripting and DOM-like functionality as well ... all within a common data model and processing paradigm. I see this as a "good thing" even if it is not absolutely needed given the existence of all this other stuff.

More philosophically, we need to be wary of carving existing practice in stone; there has to be room for evolution/markets to select the best practices. If XQuery moves that forward significantly, I personally find the cost in confusion to be tolerable.

Both sides of the debate have made convincing arguments. It's obviously desirable to factor out common features between specifications, as Evan Lenz has suggested. But having multiple tools available when tackling a job is often beneficial, which suggests that XML Query should not be dismissed out of hand. Additional lessons may also be learned from tackling similar problems from a different perspective, although to benefit in the long-term, refactoring may still be required at a later date.

The common topics in the recent discussions demonstrate that the community has a number of concerns. Hopefully these can be adequately addressed if the XML Query and XSLT Working Groups further coordinate their efforts. In reality, these concerns are over early draft specifications and experience has shown that significant revisions may occur to a specification as it moves from Working Draft to Recommendation.

All of that having been said, Marcus Carr is obviously betting on an alternative outcome.

My prediction is that Rick Jelliffe develops 'Queratron', combining the abbreviated syntax of XQuery and any existing XSLT engine. We all declare victory and go away happy...