XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Scaling Up with XQuery, Part 1
by Bob DuCharme | Pages: 1, 2, 3

As I mentioned, each XQuery engine has its own special way to load documents, and one MarkLogic method is to use the custom non-standard function xdmp:document-load. Typical MarkLogic XQuery queries treat the xdmp namespace prefix for MarkLogic extensions as a predeclared namespace prefix, like xml, xs, xsi, and a few others that all XQuery engines are required to recognize even if they're not declared.

After I loaded the recipe files into MarkLogic, I went to the http://localhost:8000/use-cases/ screen that is installed with the administration server and tried a query from my earlier article listing the titles of all recipes with sugar mentioned as an ingredient. The original query didn't work verbatim, probably because of some confusion between the root of the document and the root of the collection (the same thing happened with eXist), but the following worked just fine:


collection('recipes')/recipeml/recipe/head/title[../../ingredients/ing/item[contains(.,'sugar')]]

The following multiline query, which takes a more FLWOR-like approach to retrieve the same information instead of just being one big XPath expression, also worked from the use-cases form:


for $ingredient in collection('recipes')//
                   ingredients/ing/item[contains(.,'sugar')]
  return $ingredient/../../../head/title

The last test was to take a query that I had stored in a file and run that against the database. In a way, I had already done this when I loaded the data, because the loadrecipes.xqy file is a query file, but I wanted to run a query from my earlier articles that extracted specific data from the database. So, I tried the Food for a Crowd query:


(: Create an HTML page linking to recipes  
   that serve more than 20 people.         :)

<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>Food for a Crowd</title></head>
<body>

  <h1>Food for a Crowd</h1>
<div xmlns="">
  { 
  for $doc in collection('recipes')
    where $doc/recipeml/recipe/head/yield > 20 
      return <p><a href="getRecipe.xqy?recipeName={document-uri($doc)}">
        {$doc/recipeml/recipe/head/title/text()}</a>
      </p>
  }
</div>
</body></html>

Sending my browser to the URL http://localhost:8009/recipes/4acrowd.xqy properly displayed the created HTML page.

There were two important tweaks necessary to get it to work with all three XQuery engines:

  • I added the div wrapper. Without it, MarkLogic assumed that the RecipeML element names in the XPath expressions in the where and return expressions were in the http://www.w3.org/1999/xhtml namespace, since they didn't have namespace prefixes and the XHTML namespace was in scope. The div element with no namespace keeps its contents out of any namespace without causing problems in the resulting HTML.

  • In the Saxon version of this query, the a/@href attribute pointed to the individual XML recipe files sitting on the hard disk. For the MarkLogic and eXist versions, this attribute holds a URL that calls another query file named getRecipe.xqy, passing it the identifier of that recipe's XML within the database. The getRecipe.xqy query file, which is included in this article's accompanying zip file, retrieves that XML and converts it to HTML before sending it to the browser. The MarkLogic and eXist versions of getRecipe.xqy are identical except for the first line of each, which calls a different extension function to get the value of the recipeName parameter passed from 4acrowd.xqy to getRecipe.xqy in the URL. This ability to pass parameters from one XQuery file to another in HTTP server XQuery implementations lets you combine individual query files into larger, more complex applications.

What if you want to issue a command from a script that runs a query and saves the result in a file, instead of running your query by sending your web browser to a particular URL? A command line utility such as wget or curl can request the result of a query from an HTTP server, including MarkLogic's, as described in this article on retrieving XML from a TiVo. In addition to the URL of the query, you'll want to add the --http-user and --http-passwd parameters to your wget command line to tell the MarkLogic server that you are authorized to retrieve the data from that server. Use the administrator username and password that you created when you set up the MarkLogic server. (The TiVo article describes the use of these parameters in more detail, although I found that the password parameter in the version of wget that I'm currently using is --http-passwd and not --http-password as I described in the article. When in doubt, enter wget -h at your command line to check on the correct spelling of the parameter names.)

You don't have to retrieve data via an HTTP request — the various products provide APIs that may be more efficient for your application to use.

More of the Same

Next week, we'll see how to perform the same tasks on the same data with eXist and Berkeley DB XML, and you'll be ready to explore multiple options for the right XQuery platform for your system.



1 to 3 of 3
  1. nice piece on using eXist
    2006-09-18 20:32:01 Bob DuCharme
  2. Compatibility does matter
    2006-06-23 02:38:42 JonathanRobie
  3. Clarification on DataDirect's XQuery Implementation
    2006-06-23 02:10:18 JonathanRobie
1 to 3 of 3