XML.com 
 Published on XML.com http://www.xml.com/pub/a/2004/03/31/ferris.html
See this if you're having trouble printing code examples

 

Using libferris with XML
By Ben Martin
March 31, 2004

This article presents the benefits of using libferris with your XML applications. libferris presents a uniform interface to hierarchical data. This data can be persisted using many providers including the filesystem, an RDBMS, or even XML. All the data providers in libferris are made available using a filesystem metaphor: MySQL tables can be seen using ferrisls on a "mysql://host/database/table" URL.

The two core abstractions in libferris are the Context and the Extended Attribute (EA). You can think of the Context as a directory or file in a filesystem or as the combination of an element and its child text node(s) in XML. You can think of an EA as an XML attribute. The largest difference between an EA and an XML attribute is that the value of an EA can be either stored or generated at runtime.

There are several benefits of using libferris with your XML applications.

POSIX Command Line Replacements

The replacement directory listing command ferrisls supports an output mode --xml, which is similar to an ls -l except output is a valid XML document. This can be combined with the extended attribute handling in libferris to export interesting metadata. For example, a stylesheet might be interested in the width and height of an image. The following command will retrieve an image file and present the selected attributes as an XML document. The --show-ea parameter tells ferrisls which EA it should list in the output.

 
$ ferrisls -ld --xml \
--show-ea="name,size-human-readable,width,height" \
http://witme.sf.net/libferris.web/images/project.png

The output of above command when run from a machine with Internet access follows (formatted to fit XML.com).

 
<ferrisls>
<ferrisls 
url="http:///witme.sf.net/libferris.web/images/project.png"  
name="project.png"  >
  <context  name="project.png"  
       size-human-readable="20.0k"  width="640"  height="60"  
  />
</ferrisls>
</ferrisls>

There is a nested ferrisls element because ferrisls can list many locations during a single invocation and so the top level ferrisls is always added to ensure a unique root node.

We will now create a Sleepycat native XML database and populate it from the command line. New filesystem objects are created using either the console fcreate or the GTK+2 graphical gfcreate tools. These are distributed in the ferriscreate package. We will use fcreate to avoid the GUI in the creation process. We pass the minimum useful information to fcreate telling it the type of object to make, its filename (the Relative Domain Name or rdn), and the path at which to create the new object.


$ rm -rf /tmp/xmlcom_ferris
$ mkdir  /tmp/xmlcom_ferris
$ fcreate --create-type dbxml \
    --rdn mycollection.dbxml /tmp/xmlcom_ferris
$ ferriscp --dst-is-dir -v \
    /tmp/input.xml /tmp/xmlcom_ferris/mycollection.dbxml

We take the resulting XML from the ferrisls --xml command and put it into /tmp/input.xml to import into the dbXML database. The subtle trick to the command is the --dst-is-dir option. This is needed to tell libferris that it should treat the dbXML file itself as a directory for this operation. Otherwise the normal semantics of attempting to copy the XML into the mycollection.dbxml file itself would apply: that is, without --dst-is-dir the mycollection.dbxml would contain a byte copy of input.xml. With --dst-is-dir, mycollection.dbxml remains a dbXML file and contains a copy of input.xml as an object in its database.

Now we can access the input.xml directly from the dbXML database using the fcat command, list the entire database using ferrisls, and generate the MD5 checksum for each XML file as we go.


$ fcat /tmp/xmlcom_ferris/mycollection.dbxml/input.xml
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<ferrisls dbxml:id="1" dbxml:name="input.xml" 
  name="project.png" 
  url="http:///witme.sf.net/libferris.web/images/project.png" 
  xmlns:dbxml="http://www.sleepycat.com/2002/dbxml">
     <context height="60" name="project.png" 
          size-human-readable="20.0k" width="640"
     />
</ferrisls>

$ ferrisls -lh --show-ea="name,md5" /tmp/xmlcom_ferris/mycollection.dbxml
input.xml       6976f06b, 77827e2e, 74a8ca80, 9420052d

Support for resolving XPath 1.0 expressions has recently been added to libferris using the "pathan" library. A small directory tree is set up to illustrate:


$ cd /tmp
$ mkdir xmlcomxp
$ for i in `seq 1 3 10`; do 
   touch xmlcomxp/foo$i.xml; 
done
$ touch xmlcomxp/plain.txt

The URI style of scheme:// is bent slightly for the xpath URI scheme in libferris in that everything after the colon forms part of the XPath expression. This is done to allow the leading // in XPath to still be used to explore the entire tree. The top level filesystem items in the xpath:/ filesystem are all the other filesystem types, for example, the file:// URI scheme is represented by the file top level directory.


$ ferrisls -l  \
'xpath:/file/tmp/xmlcomxp/*[@name-extension=".xml" 
and @size<200]'
-rw-rw---- ben        ben        0       04 Jan 20 01:10 foo1.xml
-rw-rw---- ben        ben        0       04 Jan 20 01:10 foo10.xml
-rw-rw---- ben        ben        0       04 Jan 20 01:10 foo4.xml
-rw-rw---- ben        ben        0       04 Jan 20 01:10 foo7.xml

The only relational database that is accessible with the open source version of libferris currently is MySQL. The user name and password to use for each server is setup using ferris-capplet-auth graphical tool rather than embedding authentication information into URLs directly. The capplet allows you to test each authentication setting to make sure its acceptable.

Once the appropriate authentication is given, libferris can be used to explore and export relational data. Listing the top level mysql URL scheme will show you hosts which are currently known. Listing a host shows you the databases on that host.


$ ferrisls mysql:// 
localhost
$ ferrisls mysql://localhost
... exphpresso ...
$ ferrisls mysql://localhost/exphpresso
coffees  comments  definition  types

If you've entered authentication information for remote databases then you can list them with ferrisls as though they existed in mysql://; libferris will connect to them and create the appropriate file. For example, with this command a connection to the server foo is created and the databases on it are listed:


$ ferrisls mysql://foo
... stocks ...

The EA interface in libferris also presents interesting metadata about the filesystem itself. One such attribute is the recommended-ea. This is a comma separated list of attributes which a Context thinks are interesting attributes for viewing its children Contexts. For relational databases the recommended-ea contains an entry for each column name in the table or query result set. One can tell ferrisls to present the recommended-ea by adding -0 to the command line. Using --xml implies that the recommended-ea be shown.


$ ferrisls -0v mysql://localhost/exphpresso/coffees
1       Classics        Americano        One shot of ...
2       2lassicsid      Classic Espresso One shot of ...
...
$ ./ferrisls --xml mysql://localhost/exphpresso/coffees
<ferrisls>
<ferrisls url="mysql:///localhost/exphpresso/coffees"  
          name="coffees"  >
 <context  id="1"  
           coffee_type="Classics" coffee_name="Americano"  
           coffee_details=" One shot of expresso brewed..."
           name="1"  primary-key="id"  />
 ...
</ferrisls></ferrisls>

The next command uses XPath to select some rows from the relational data in the coffees table starting with a given coffee_name and then sorts the results by the coffee_name. The sorting specification used in this parameter allows arbitrary nesting of sorts, as well as reverse, floating, case insensitive and version sorting. The URL for the displayed context is selectionfactory:// which is a filesystem designed to hand around a collection of links to other filesystems. In this case it is a selection of rows from a table, but it can pass arbitrary data around.


$ ferrisls --xml --ferris-sort="coffee_name" \
  'xpath:/mysql/localhost/exphpresso/
     coffees/*[starts-with(@coffee_name,"Cafe")]'
<ferrisls>
<ferrisls url="selectionfactory://"  name="1"  >
 <context  id="28"  coffee_name="Cafe Brulot"  ... />
 <context  id="31"  coffee_name="Cafe Diablo"  ... />
</ferrisls></ferrisls>

It should be noted that the XPath query is not converted to SQL for execution; it's more expensive to execute than embedding SQL with libferris.

Summmary

I've tried to give an overview of what is possible with libferris and XML and highlight some of the areas where libferris can remove boundaries. If you've enjoyed reading about libferris please consider making a contribution to the project.

XML.com Copyright © 1998-2006 O'Reilly Media, Inc.