XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

XQuery and Data Abstraction
by Kurt Cagle | Pages: 1, 2

XPath Extensions

The final major change is the creation of an explicitly defined extension mechanism for XPath. What this means in practice is that for any XPath 2.0 engines, so long as you are conformant with the method declaration mechanism, you can create XPath extensions using Java, JavaScript, PHP, .NET, or other traditional languages. You can also create XPath extension functions using XSLT and XQuery, each of which opens up a fairly radical means for creating complex APIs that are still XML-oriented.

By creating a common interface mechanism, the XPath 2.0 group both sanctioned the idea of building extensions in the first place, and did so in such a way that even if you moved from one platform or programming language, the only changes you would need to introduce would be related to reimplementing the extensions in the new language; the XQuery or XSLT code remains the same.

Extensions in XPath are handled by the hosting environment within separate namespaces and are later referenced via namespace prefixes. For instance, suppose that you wanted to define a function (to-title-case()) which would take an expression, such as "This is a test," and turn it into a single variable-like expression in title case notation, like "ThisIsATest" (a useful prelude for turning labels into element names). In XSLT 2.0, such a function would be defined by:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:str="http://www.metaphoricalweb.org/xmlns/string-utilities"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    version="2.0">
    <xsl:function name="str:title-case" as="xs:string">
        <xsl:param name="expr"/>
        <xsl:variable name="tokens" select="tokenize($expr,' ')"/>
        <xsl:variable name="titledTokens" select="for $token in $tokens return
            concat(upper-case(substring($token,1,1)),
                      lower-case(substring($token,2)))"/>
        <xsl:value-of select="string-join($titledTokens,'')"/>
    </xsl:function>
    <xsl:template match="/">
        <data><xsl:value-of select="str:title-case('This is a test')"/></data>
    </xsl:template>
</xsl:stylesheet>

In XQuery, the definition is a little more compact, but follows a similar structure:

declare namespace str="http://www.metaphoricalweb.org/xmlns/string-utilities";
declare namespace xs="http://www.w3.org/2001/XMLSchema";

declare function str:title-case($expr as xs:string) as xs:string {
    let $tokens := tokenize($expr,' ')
    let $titledTokens := for $token in $tokens return
          concat(upper-case(substring($token,1,1)),lower-case(substring($token,2)))
    return string-join($titledTokens,'')
    };
    
<data>{str:title-case("This is a test")}</data>

In both cases, a namespace for the function is first declared. It is represented here by the namespace declared as:

xmlns:str="http://www.metaphoricalweb.org/xmlns/string-utilities"

Data types are assigned to input parameters and resulting output value, here using the as attribute and operator respectively. The specific function declaration can of course be quite different between languages, but in general, the signature should remain consistent across platforms. Finally, in both cases, the specific invocation of the function and the passing of parameters is the same, because at the point where the functions are invoked, they are essentially considered XPath 2.0 functions:

XSLT: <data><xsl:value-of select="str:title-case('This is a test')"/></data>
XQuery: <data>{str:title-case("This is a test")}</data>

Note that this principle holds true regardless of the host language; a Java class called from Saxon (or Xalan) would still be declared as a namespace, except in this case the namespace corresponds to a Java class defined in the classpath, and the protocol is the java: protocol:

xmlns:str = "java:org.metaphoricalweb.stringUtilities"

Similar mechanisms exist for .NET and PHP 5, making it possible to associate functions (or classes of functions) in those languages, with the associated namespaces.

XQuery Abstraction and Object-Oriented Programming

In many ways, this extension mechanism is one of the most important aspects of XPath 2.0. Typically, you do not associate only one function with a given namespace, rather you associate a number of related functions. For instance, a function such as str:from-title-case() that performs the inverse operation to str:title-case(), converting a title case string into a delimiter-separated one, may very well be a part of the same string utilities class that title-case() is. This idea is used fairly extensively by an XML databases like eXist for providing access mechanisms into the database proper from within the context of the XQuery code, and from a design standpoint shifts the deployment of XQuery statements more into an object-oriented approach, rather than simple procedural scripts.

This ability to both extend XPath and to create an object-like mechanism for manipulating that Xpath has a mirror in the SQL concept of stored procedures, with the primary difference being that most stored procedure implementations do not have a notion of namespaces or classes. The implications that arise from that namespace differentiation, however, are not insignificant.

XML has long resisted fitting neatly into an object-oriented paradigm, both because an XML structure is hard to encapsulate and generally doesn't have functional methods associated with it (polymorphism, on the other hand, it's got down cold). On the other hand, if you have a functional namespace within a language such as XQuery, then the route to object-oriented programming comes down to the ability to create a unique key for extracting some form of virtual record, then applying that key as the first argument in any XQuery namespaced function.

For instance, suppose that you have a set of invoice records rendered in XML and an invoice namespace called (appropriately enough) invoice:. In that case, you can define a method called invoice:new(args) that will create a new invoice record, and most signicantly, return a unique GUID identifying that record:

let $invoice := invoice:new()
return $invoice
-- "A1DFCDEA22318165226AB10DAA411442D439"

Once you've done this, this key becomes a parameter to any other method within the namespace:

invoice:addItemX($invoice,
     <item>
          <name>Pencil</name>
          <quantity>12</quantity>
          <sku>C11D-A</sku>
          <cost>0.89</cost>
     </item>
     )
(: or :)
invoice:addItem($invoice,"Pencil",12,"C11d-A",0.89)
invoice:getObject($invoice)

In other words, XPath 2.0 functionality (through XQuery or XSLT) opens up the possibility of working with objects where the data is not encapsulated within the object, but is instead stored within an external data store or even exists only in virtual form.

Data abstraction is the process whereby you make the access and update mechanisms for a given database (regardless of its implementation) transparent to the host system. SQL takes this process to a certain point, but many of the precepts that are central to object-oriented programming didn't really gain a foothold until after SQL had been standardized. This has meant that most application development can be seen as the systematic association of a SQL view of the universe and an object oriented view, usually at the cost of extreme coupling. Most frameworks have evolved some hook to synchronize between the two world views, but they typically necessitate fairly extensive bindings of SQL content into code.

XPath 2.0 and XQuery, on the other hand, have the potential to abstract not only beyond a single SQL database implementation but beyond SQL itself. If you can create an XML bridge to both query and update a given type of data store (one where XML is either produced or consumed, as appropriate, even if the core operations deal with non-XML data) then you can use XQuery both to manipulate the XML side of the bridge and to build an object abstraction layer. Thus, LDAP interactions, text files, OLAP cubes, JSON, and other data protocols can be abstracted with an XQuery object layer so that their differences become invisible to the processes that utilize these services.

Note that this doesn't mean that LDAP is going to be implemented in XML, that JSON isn't going to start being mapped to XML (though see E4X), or that SQL is going away any time soon. There are obviously situations where the specific features of these systems necessitate that work be done in the form that is most well optimized for that environment. However, a surprisingly large number of cases exist where what is important is that the data be accessible in a way that is most transparent to the overall development process, regardless of the container of that data. In those cases, I see XQuery gaining a huge mindshare.

Summary

Just as it is not easy to write about all aspects of SQL in a single article, the implications that XQuery holds for both data abstraction and distributed object-oriented programming can be tough to more than hint at in a single article. In the next part of this series, I'm going to get into the basics of how XQuery itself works; in the article after that I will look at current XQuery implementations in contemporary XML databases and related engines.

When I started working with XQuery, I was initially skeptical about it. Recently, as I see the implications that are opened up by the modularization of the language, the emergence of XQuery aware data repositories, and the shift towards rich client applications that are increasingly XML-centered, I'll admit that I was wrong in my initial assessments. XQuery is an important technology, and it will very likely have a huge impact upon the way that applications are built, especially in distributed environments.

Kurt Cagle is a systems developer and information architect living in Victoria, British Columbia. He is also the webmaster for XForms.org, a news and code portal for the XForms and XQuery community.



1 to 2 of 2
  1. The slash operator
    2007-07-25 06:46:34 ajwelch
  2. JSON
    2007-07-16 14:33:06 JSON
1 to 2 of 2