XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XQuery, libferris, and Virtual Filesystems
by Ben Martin | Pages: 1, 2

Direct interaction with this database is possible from the command line. The -0 command-line option to ferrisls is similar to the -l option to ls(1) except the filesystem itself is asked which attributes it recommends to show to the user. In the case of a table from a relational database all the columns of the table are recommended by the filesystem as being interesting to the user. To call a postgresql function, the URL must be quoted so that bash will not try to interpret the parenthesis. Notice that the (extended) attributes are named f1, fname, and lname for the files returned by the postgresql functions. This is because the type returned by customerlookup() includes these names.

$ ferrisls    pg://localhost/xmldotcom2007
$ ferrisls -0 pg://localhost/xmldotcom2007/customers
131     Ziggy   Stardust        131     id
15      Bobby   McGee   15      id
3       Foo     Bar     3       id
$ ferrisls --xml  pg://localhost/xmldotcom2007/customers
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<ferrisls>

  <ferrisls familyname="" givenname="" id="" 
     name="customers" primary-key="id" 
     url="pg:///localhost/xmldotcom2007/customers">
    <context familyname="Stardust" givenname="Ziggy" 
       id="131" name="131" primary-key="id"/>
...

$ ferrisls --xml  'pg://localhost/xmldotcom2007/customerlookup(3,3)'
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<ferrisls>
  <ferrisls ...>
    <context f1="3" fname="Foo" lname="Bar" name="3-Foo-Bar"
        primary-key="f1-fname-lname"/>
  </ferrisls>
</ferrisls>

The following performs a call to the postgresql function from an XQuery and outputs the results. Note the use of the @f1, @lname as attribute names because the postgresql function gives the results these names with the PostgreSQL type customerlookup_result. The XPath expression just loops over all the returned tuples assuming that the core logic is implemented in the postgresql function itself.

declare variable $docurl := "pg://localhost/xmldotcom2007/customerlookup(";
declare variable $mincustomerid := "3";
declare variable $maxcustomerid := "15";
<resultdata>
 {
  for $c in ferris-doc( 
     concat( $docurl, $mincustomerid, ",", $maxcustomerid, ")" ))/*
  return
    <person cid="{ $c/@f1 }" 
       surname="{ $c/@lname }" fn="{ $c/@fname }" />
 }
</resultdata>

XQuery as Desktop and Network Search

The filesystem index and search, as described in this Linux Journal article, can also be used from XQuery. This makes it very easy to build a custom Intranet search solution combining information from the file server index, RDF, and other locations with XQueries. Another application is finding the documents you want to perform an XQuery on using a filesystem search as an outer loop on ferris-doc("fulltextquery://...") and an inner loop on the document itself in your XQuery.

The below XQuery will search for "alice" and "wonderland" as a Boolean full text search performed against your filesystem index and return the results as a very simple XML file. Since you can combine many calls to ferris-doc() in the one XQuery, you could quickly build a nice user interface to file server search using just libferris and XQuery.

declare variable $qtype    := "boolean";
declare variable $person   := "alice";
declare variable $location := "wonderland";
<data>
 {
  for $idx in ferris-doc( concat("fulltextquery://", $qtype, "/", 
           $person, " ", $location))
    for $res in $idx/*
       return
    <match 
            name="{ $res/@name }" url="{ $res/@url }" 
            modification-time="{ $res/@mtime-display }"
           >
    </match>
 }
</data> 

$ ferris-xqilla xquery-index.xq
<?xml version="1.0"?>
<data>
  <match modification-time="99 Jul 27 12:53" 
     name="file:///.../doc/CommandLine/command.txt ...>
  <match modification-time="00 Mar 11 06:58" 
     name="file:///.../doc/Gimp/Grokking-the-GIMP-v1.0/node8.html
     ...>
...</data>

The filesystem indexes can be combined with querying by location. For example, the below XQuery will search any files which are geotagged as being in Florence, Italy. For details on setting up geotagging, place name disambiguation, see this Linux.com article

declare variable $placename := "eiffel-tower";
<data>
 {
  for $idx in ferris-doc( concat("eaq://(emblem:has-", $placename, "==1)"))
    for $res in $idx/*
       return
    <match 
            name="{ $res/@name }" url="{ $res/@url }" 
            modification-time="{ $res/@mtime-display }"
           >
    </match>
 }
</data>

Keeping a db4 Cache Hot with rsync

Both XML and db4 can be seen as filesystems with libferris, so you can keep a db4 file up to date with an XML file using the standard rsync(1) tool. In order to do this you need to expose libferris as a Filesystem in Userspace (FUSE) filesystem both as the source and destination for rsync. The tool to expose libferris through FUSE is called ferrisfs.

As rsync supports extended attributes with the -X command-line option, XML attributes can be synced to those in the db4 file cache. Filesystems expect (extended) attributes that users can store information into to be prefixed with "user.". This creates a very simple namespacing of (extended) attributes with "system." attributes carrying restrictions on who can set them.

This filesystem implementation detail is taken care of with the --prepend-user-dot-prefix-to-ea-regex option to ferrisfs. As the XML file does not conform to the "user." namespace restriction the --prepend-user-dot-prefix-to-ea-regex option can be used to have ferrisfs do some name space marshaling. For example, an XML attribute such as id will be reported by ferrisfs as user.id at the input filesystem. At the destination end the "user." prefix will be automatically stripped again; rsync itself only sees the "user.x" extended attributes in both filesystems and everybody is happy.

The --show-ea-regex option is used to tell ferrisfs which (extended) attributes are reported to rsync as existing. This means that any attributes in the XML file not matching this regular expression are not synced to the db4 file.

fcreate --create-type=db4 --rdn=customers.db .
mkdir -p customers
mkdir -p input 
ferrisfs --prepend-user-dot-prefix-to-ea-regex='.*'  -u `pwd`/customers.db customers
ferrisfs --prepend-user-dot-prefix-to-ea-regex='.* \
 --show-ea-regex='(id|givenname|familyname)'  -u `pwd`/customers.xml/customers input
rsync -avzX --delete-after input/ customers/
db_dump -p customers.db 
...
fusermount -u input
fusermount -u customers

Summing Up

The combination of XQilla and libferris allows you to combine access to the many filesystems that are supported by libferris within a single XQuery. In the case of db4 and XML, you can select between the two formats to gain the performance you desire with very little change to the XQuery itself.

Other interesting data sources that libferris makes available include rdf DB (as created with the redland library) and direct queries of what is shown in Firefox. Unfortunately, examples of these will have to wait for another article.

Additional Resources



1 to 1 of 1

  1. 2010-07-16 03:32:08 laura44
1 to 1 of 1