XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XQuery, XSLT, and OmniMark: Mixed Content Processing

December 06, 2006

Document-oriented XML usually has highly irregular structure in which elements might be mixed in unknown way. Processing such XML requires advanced data-driven facilities: push-style processing enriched with transformation rules and side-effect-free updates. In this article we emphasize such facilities in three XML-native languages: XQuery, XSLT, and OmniMark; and analyze applicability of these languages and their combinations to document-oriented XML processing. As data in many practical applications often comes as a result of a database query, we also examine various approaches to combine XQuery with XSLT or OmniMark for document-oriented XML processing over a database system.

What is notable about processing document-oriented XML data is that a particular XML element can appear virtually everywhere in the content (i.e. at any level of the hierarchy of the XML document tree and intermixed with any elements). Processing such elements, one usually wants to preserve their relative positions among other elements in the XML document tree. In other words, some elements are to be replaced while others are to be reserved. The replacement for an element may consist of nothing, another element, or a sequence of elements. Below we provide a number of particular examples of such replacements.

XQuery Versus XSLT and OmniMark

The primary approach to processing document-oriented XML data is data-driven transformation (where the order of the output is dictated by the order of the input) as opposed to code-driven transformation (where the order of the output is dictated by XSLT stylesheets, OmniMark rules, or XQuery queries).

Using data-driven transformation, it is very easy to preserve the relative position of elements being processed. In XSLT and OmniMark, data-driven transformations can be naturally expressed in push style using transformation rules.

Let us consider an example. Suppose we need to process a document-oriented XML document (doc.xml) as follows: replace all elements named "a" with an element named "b," which contains the content of "a" wrapped in the "*" symbol. This is how it looks in XSLT.

<xsl:stylesheet 
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
   version="1.0">

  <xsl:template match="a">
    <b>*<xsl:value-of select="text()"/>*</b>
  </xsl:template>

  <xsl:template match="*">
    <xsl:element name="{node-name(.)}">
       <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>

</xsl:stylesheet>

The same can be expressed in OmniMark as follows.

element a
    output "<b>*" || "%c" || "*</b>"

element #implied
    output "<%q>%c</%q>"

process
    do xml-parse
    scan file "doc.xml"
        output "%c"
    done

As XQuery has no support for push style--it is a pure pull-style language--the only way to express such transformation in XQuery is to use the polymorphic recursive function. The function traverses the source document and reconstructs it, replacing only the required elements. The following recursive function implements the same transformation as in the previous XSLT example.

declare function local:traverse-replace($n as node())
as node()
  {
     typeswitch($n)
       case $a as element(a)
       return
           <b>*{$a/text()}*</b>
       case $e as element()
       return element
         { fn:local-name($e) }
         { for $c in $e/(* | text())
           return local:traverse-replace($c) }
       case $d as document-node()
       return document
         { for $c in $d/* return local:traverse-replace($c) }
       default return $n
  };

The transformation can be applied to a whole document by invoking the local:traverse-replace function on the root node of the document, as follows:

local:traverse-replace(doc("doc.xml"))

Another way to accomplish such transformation in XQuery has been recently introduced by the W3C in "XQuery Update Facilities." The facilities extend XQuery with the transform operator, which allows performing data-driven XML transformations in a way that is very different from all previous approaches.

  1. In all previous examples, we had to express the reconstruction of the whole document including even those elements that remain unchanged. Using transform, you can avoid the reconstruction of the elements that remain unchanged.
  2. Another difference lies in execution models. Both the XQuery recursive function and the push-style approach (XSLT and OmniMark) inherently imply an execution model based on sequential scan. This means that the executor scans all of the document to process it. transform can be implemented using a random access execution model that avoids sequentially scanning all the data and employs instead alternative ways to access the required data (mainly via indices). The possibility to implement transform via the random access execution model makes transform suitable for efficient support in database systems.

The main idea of the XQuery transform operator is to employ traditional in-place updates for data transformations. The semantics of in-place updates are modified to avoid side effects. That is why we refer to transform as a side-effect-free update. Semantically, instead of modifying the document, updates are evaluated on a new copy of it. Operationally, transform can be implemented without actual data copying (e.g., using a shadow mechanism as proposed in [Rekouts2006]).

The above example can be expressed via XQuery transform as follows.

transform
 copy $new:=doc("doc.xml")
 modify
   for $a in $new//a
   do replace $a with <b>*{$a/text()}*</b>
 return $new

It is worth noting that currently we do not know of any implementations of the XQuery transform. Its efficient support is still an open research issue.

Comparing the approaches discussed above, we conclude that the push-style approach powered by transformation rules provided by XSLT and OmniMark is a better choice for processing document-oriented XML data. The XQuery recursive function approach is usually harder to code and maintain than the push-style approach. As concerns XQuery transform, it remains to be seen how effective the transform approach is and its usability is especially questionable in case of complex transformations.

If XSLT and OmniMark seem to be well-suited for document-oriented XML data transformation, why not to use them and forget about XQuery? Growing volumes of XML data (especially in an enterprise environment) often require a database for XML data management. For example, replacing XML elements that represent references (or placeholders) might require querying a database that contains references (or placeholders) mapping to substitutes. This means that XQuery is still required as a database query language, and we need to find the right way to combine transformation languages--XSLT and OmniMark--with a query language--XQuery. In the following sections we analyze two approaches to the combination.

Pages: 1, 2, 3

Next Pagearrow