Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Scaling Up with XQuery, Part 1
by Bob DuCharme | Pages: 1, 2, 3

As I mentioned, each XQuery engine has its own special way to load documents, and one MarkLogic method is to use the custom non-standard function xdmp:document-load. Typical MarkLogic XQuery queries treat the xdmp namespace prefix for MarkLogic extensions as a predeclared namespace prefix, like xml, xs, xsi, and a few others that all XQuery engines are required to recognize even if they're not declared.

After I loaded the recipe files into MarkLogic, I went to the http://localhost:8000/use-cases/ screen that is installed with the administration server and tried a query from my earlier article listing the titles of all recipes with sugar mentioned as an ingredient. The original query didn't work verbatim, probably because of some confusion between the root of the document and the root of the collection (the same thing happened with eXist), but the following worked just fine:


collection('recipes')/recipeml/recipe/head/title[../../ingredients/ing/item[contains(.,'sugar')]]

The following multiline query, which takes a more FLWOR-like approach to retrieve the same information instead of just being one big XPath expression, also worked from the use-cases form:


for $ingredient in collection('recipes')//
                   ingredients/ing/item[contains(.,'sugar')]
  return $ingredient/../../../head/title

The last test was to take a query that I had stored in a file and run that against the database. In a way, I had already done this when I loaded the data, because the loadrecipes.xqy file is a query file, but I wanted to run a query from my earlier articles that extracted specific data from the database. So, I tried the Food for a Crowd query:


(: Create an HTML page linking to recipes  
   that serve more than 20 people.         :)

<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>Food for a Crowd</title></head>
<body>

  <h1>Food for a Crowd</h1>
<div xmlns="">
  { 
  for $doc in collection('recipes')
    where $doc/recipeml/recipe/head/yield > 20 
      return <p><a href="getRecipe.xqy?recipeName={document-uri($doc)}">
        {$doc/recipeml/recipe/head/title/text()}</a>
      </p>
  }
</div>
</body></html>

Sending my browser to the URL http://localhost:8009/recipes/4acrowd.xqy properly displayed the created HTML page.

There were two important tweaks necessary to get it to work with all three XQuery engines:

  • I added the div wrapper. Without it, MarkLogic assumed that the RecipeML element names in the XPath expressions in the where and return expressions were in the http://www.w3.org/1999/xhtml namespace, since they didn't have namespace prefixes and the XHTML namespace was in scope. The div element with no namespace keeps its contents out of any namespace without causing problems in the resulting HTML.

  • In the Saxon version of this query, the a/@href attribute pointed to the individual XML recipe files sitting on the hard disk. For the MarkLogic and eXist versions, this attribute holds a URL that calls another query file named getRecipe.xqy, passing it the identifier of that recipe's XML within the database. The getRecipe.xqy query file, which is included in this article's accompanying zip file, retrieves that XML and converts it to HTML before sending it to the browser. The MarkLogic and eXist versions of getRecipe.xqy are identical except for the first line of each, which calls a different extension function to get the value of the recipeName parameter passed from 4acrowd.xqy to getRecipe.xqy in the URL. This ability to pass parameters from one XQuery file to another in HTTP server XQuery implementations lets you combine individual query files into larger, more complex applications.

What if you want to issue a command from a script that runs a query and saves the result in a file, instead of running your query by sending your web browser to a particular URL? A command line utility such as wget or curl can request the result of a query from an HTTP server, including MarkLogic's, as described in this article on retrieving XML from a TiVo. In addition to the URL of the query, you'll want to add the --http-user and --http-passwd parameters to your wget command line to tell the MarkLogic server that you are authorized to retrieve the data from that server. Use the administrator username and password that you created when you set up the MarkLogic server. (The TiVo article describes the use of these parameters in more detail, although I found that the password parameter in the version of wget that I'm currently using is --http-passwd and not --http-password as I described in the article. When in doubt, enter wget -h at your command line to check on the correct spelling of the parameter names.)

You don't have to retrieve data via an HTTP request — the various products provide APIs that may be more efficient for your application to use.

More of the Same

Next week, we'll see how to perform the same tasks on the same data with eXist and Berkeley DB XML, and you'll be ready to explore multiple options for the right XQuery platform for your system.


Comment on this articleShare your experience in our forums.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • nice piece on using eXist
    2006-09-18 20:32:01 Bob DuCharme [Reply]

    In Kurt Cagle's weblog (http://www.oreillynet.com/xml/blog/2006/09/i_think_therefore_i_exist.html) . It looks like eXist has gotten faster since I wrote this article.

  • Compatibility does matter
    2006-06-23 02:38:42 JonathanRobie [Reply]

    Bob's article suggests that most users care only about one XQuery implementation, and therefore don't care much about compatibility. I don't think that's really true.


    I
    blogged on this (http://blogs.datadirect.com/jonathan_robie/2006/06/xquery_compatibility_and_the_x.html) , discussing where XQuery is in the standardization process, and comparing it to the same time in the evolution of XSLT 1.0.

    • Compatibility does matter
      2006-06-23 05:01:31 Bob DuCharme [Reply]

      Jonathan,


      In part 2 of the article, I said that "XQuery advocates I've met [not "most users"] tend to be strongly committed to one implementation." I'm only judging based on the ones I've come into contact with. I do know of one shop that avoids the use of any extensions, but only one.


      Bob

  • Clarification on DataDirect's XQuery Implementation
    2006-06-23 02:10:18 JonathanRobie [Reply]

    I'd like to clarify one thing said in the article:


    "DataDirect XQuery and IBM's support seem more geared toward using XQuery against data stored in relational databases."


    First off, users can get more information on our product here:


    http://www.datadirect.com/products/xquery/index.ssp


    DataDirect XQuery is not just for relational data, it also supports querying XML files, SAX streams, StAX streams, and DOM trees, and it has support for very large XML files. Queries can address XML, relational data, flat file formats such as EDI (using XML adapters that convert them to XML on the fly), or combinations of these sources. I know Bob is aware of this, but the text quoted above could be misread, so I wanted to clarify this.


    Jonathan