XQuery, XSLT, and OmniMark: Mixed Content Processing
Pages: 1, 2, 3
Combining XQuery with XSLT or OmniMark
Before we analyze different ways of combining XQuery/XSLT and XQuery/OmniMark, it is worth a note that OmniMark and XQuery can be integrated directly, as OmniMark supports API-to-XQuery database systems (i.e., OmniMark plays the role of a host language for XQuery). XSLT/XQuery integration requires a third language to glue them together. If we are using command-line XSLT and XQuery processors, then a scripting language like Bash can play the role of the glue language. Or if we use XSLT and XQuery libraries, a general-purpose programming language like Java can be used to glue them.
Tightly Coupled Solution
Let us consider the following example. Suppose that there is an XML document that includes placeholders, which refer to fragments of a book stored in an XML database. We need to publish the document, replacing the placeholders with the corresponding fragments rendering them. Below is the document.
Example: document.xml
<page>
...
<fragmentref>378</fragmentref>
...
<fragmentref>835</fragmentref>
...
</page>
The simple solution is to query the database each time we come across a reference to a fragment during the XSLT or OmniMark transformation. Below is an example in OmniMark and XQuery. We use OmniMark API to Sedna XML database in this example.
Example: process.xom
mport "omdb.xmd" prefixed by db.
global db.database moviedb
define string source function div-render (value string source s)
as
; rendering code is here
element fragmentref
local db.field result variable
db.query moviedb
statement "doc('book.xml')//div[@id='%c']"
into result
do when db.record-exists result
output div-render(db.reader of result)
done
element #implied
output "<%q>%c</%q>"
process
set moviedb to db.open-sedna "localhost" dbname "moviedb"
user "SYSTEM" password "MANAGER"
do xml-parse
scan file "document.xml"
output "%c"
done
db.close moviedb
This solution is tightly coupled due to the following properties:
- It requires an API to access a database from a transformation language. In the example above, we used the OmniMark API to the Sedna XML database system, which supports XQuery.
- There is a query sent to the database for each reference. It might result in poor performance if there is a large number of references in the document, because each query is a call from one execution environment (OmniMark) to another (the database system) that leads to essential overhead.
While the size of the query result, which is a book fragment, is not known in advance, streaming processing of the fragment in OmniMark allows for processing it regardless of its size. As XSLT engines do not support streaming, the size of the query result that can be processed by XSLT engines is restricted by the size of available memory.
Another problem with the XSLT implementation of this solution is that XSLT engines usually do not support APIs to XML database systems. This means that an XSLT-based implementation has to call the database via an extension function implemented in a programming language, with an XQuery API that overcomplicates the implementation.
To conclude, we would like to emphasize that while this solution can suffer from pure performance because of many query calls, it does not impose any limitations on the size of the query result and allows using streaming transformation. The solution described in the next section has different properties.
Loosely Coupled Solution
Let us consider a popular example of document-oriented XML processing known as dynamic linkage. The idea is that 1) the content is marked up with semantically meaningful XML elements that represent media-neutral links, and 2) the elements are then replaced with the media-specific links at the time of content delivery (rendering). Dynamic linking is especially useful in the context of single source publishing, when the author focuses on content creation and does not have to worry about how content is delivered.
Consider a project to create a collection of movie reviews with associated information and to create output to various media. Movie reviews are full of references to other movies, actors, directors, places, times, and themes. All these references are good places to create links to other resources, such as biographies, maps, or histories. Instead of using direct HTML links, which are media-specific, the author marks up references with XML tags. These tags are named so that they describe the type of the reference (e.g. movie, actor, director). These tags have an attribute, name, which allows for the retrieval of information required to construct the media-specific links. When we publish reviews on the Web, we might link them to Wikipedia using HTML links. When we publish reviews on CD, we put a link to local resources.
Here is an example of a movie review with a reference to a director.
Example: reviews.xml
<reviews>
<review>
<title>Titanic</title>
<genre>romance</genre>
<text>
...
<p><director name="James Cameron">James Cameron's</director>
194-minute, $200 million film
of the tragic voyage is in the tradition of the great
Hollywood epics.</p>
...
</text>
</review>
...
</reviews>
Below is the corresponding fragment of the links mapping (people.xml). The document people.xml contains person elements, which have id attributes and contain biography elements with biography references for various media. The url element contains the URL to the director's biography intended for publishing on the web page. The file element has a path to the biography stored locally on the CD-ROM. The text element provides a brief biography for publishing on the print media.
Example: people.xml
<people>
<person id="James Cameron">
<biography>
<url>http://en.wikipedia.org/wiki/James_Cameron</url>
<file>/biography/james_cameron.html</file>
<text>
James Francis Cameron (born August 16, 1954) is
a Canadian-born American film director noted for
his action/science fiction films, which are often
extremely successful financially...
</text>
</biography>
...
</person>
...
</people>
This application can be implemented using the tightly coupled approach, but we will try to improve the performance by minimizing the number of database queries. This may be achieved by decomposition of the application into two separate tasks: database querying and reference processing. This approach allows for minimizing the inter-environment communication to just one data transmission, and as a pleasant side effect, it does not require an API from the transformation language to the database. This is why we refer to this solution as loosely coupled.