XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Scaling Up with XQuery, Part 2

June 21, 2006

Last week, we learned that although scaling up from Saxon's implementation of an in-memory XQuery database to a disk-based version requires a bit of extra effort, it's worth doing because you can create applications around much larger collections of data. And, it can be done for free. We also looked at how to set up and use MarkLogic server. This week, we'll see how to perform the same setup and usage tasks with two more servers: eXist and Sleepycat's Berkeley DB XML. You can find the sample data and scripts in this zip file.


As with MarkLogic, you usually interact with the open source eXist XQuery engine through an HTTP server that is part of the program. After you download and install it, the routine instructs you to start the server by selecting eXist Database Startup from the eXist group of the Windows Start menu. Once you do this, send your browser to http://localhost:8080/exist to view the administration screen.

Here, you will find a lot of good tips for getting started with eXist, including information about starting the Client shell (which is also available from the eXist group of the Windows Start menu). Before loading documents into an eXist collection, you must create the collection either by clicking the client shell's "Create new collection" icon (a little document with a star) or by entering a command such as the following at the exist:/db prompt in the lower pane of the client shell:

mkcol recipes

(Note that you can navigate and create new collections within a collection, which caused me some confusion at first. A /db> in the shell prompt shows that you're at the root of the main collection.) After creating the collection with the command above, I loaded data into it by:

  1. Putting the loadrecipes.xqy file below into a cookbook subdirectory that I created in C:\Program Files\eXist\webapp. (This directory is automatically created by the eXist installation.)

  2. Sending a browser to the URL http://localhost:8080/exist/cookbook/loadrecipes.xqy.

(: Load recipe files into eXist database. :)

xquery version "1.0";

declare namespace xmldb="http://exist-db.org/xquery/xmldb";

  (: We'll load each file into the coll1 collection as the administrator. :)
  let $collection := xmldb:collection("xmldb:exist:///db/recipes", "admin", "")
  (: Instead of 3 file names, the actual loadrecipes.xqy script has 291. :)
  let $filenames := ("_Baking_the_Best_Muffins_","_Butter_",
  for $dataFilename in $filenames
      let $name := $dataFilename
      let $URI := xs:anyURI(concat("file:///c:/dat/xquery/recipeml/",$name,".xml"))
      let $retCode :=  xmldb:store($collection, $name, $URI)
      return <p>loaded {$retCode}</p> 

This "query" file has a few things in common with the loadrecipes.xqy file that I created to load data into MarkLogic:

  • Because the XQuery specification doesn't describe a way to load data, the actual loading is done by an extension function called, in the eXist case, xmldb:store.

  • Because I will use a browser to execute the query, the process of loading creates an HTML page with messages confirming that each file has successfully loaded. If there are problems, eXist will display error messages as well.

After you load these files, you can query them. For an interactive query, click the binoculars icon in the eXist client shell to bring up the "Query Dialog" dialog box. Clicking in the top pane, entering an XPath expression, and then clicking the Submit button puts the query results in the bottom pane. Although the pop-up tool tip for the binoculars icon says that it lets you "Query the database with XPath," full XQuery expressions seem to work as well. Both of the queries to list the titles of recipes with "sugar" as an ingredient, shown in Part 1 of this article, worked fine here.

To run a query stored in a file against the loaded database, I copied the 4acrowd.xqy query file from Part 1 into C:\Program Files\eXist\webapp\cookbook and sent a browser to http://localhost:8080/exist/cookbook/4acrowd.xqy. This displayed the results properly, and a wget command line also retrieved the query results.

To be honest, there were actually lots of intermediate steps. The first version of 4acrowd.xqy that worked in MarkLogic didn't work in eXist, and when I fixed that, it didn't work anymore in MarkLogic. I had to tweak it further for the version shown above, which works with MarkLogic, eXist, and Berkeley DB XML. (The most significant tweak was adding the div element to change the namespace scope of the XPath expressions in the FLWOR section.) Remember, the XQuery standard is not set in stone as a W3C Recommendation yet. While many XQuery advocates complain that the standardization process moves too slowly, it is not nearly as easy to write non-trivial XQuery code for one implementation and then use it with another — which is, after all, the point of using standards-based software — as it is to move an XSLT 1.0 stylesheet from Xalan to Saxon to libxslt unchanged. (XQuery advocates I've met tend to be strongly committed to one implementation, which makes porting a minor issue for them.)

Pages: 1, 2

Next Pagearrow