Implementing XPath for Wireless Devices, Part II
In the first part of this article, we introduced XPath and discussed various XPath queries ranging from simple to complex. By applying XPath queries to sample XML files, we elaborated upon various important definitions of XPath such as location step, context node, location path, axes, and node-test. We then discussed complex XPath queries that combine more than one simple query. We also discussed the abstract structure of Wireless Binary XML (WBXML), which is the wireless counterpart of XML. Finally we presented the design of a simple XPath processing engine.
In this part, we will discuss the features of XPath which allow for complex search operations on an XML file. We will discuss predicates or filtered queries and the use of functions in XPath. We will present various XPath queries for the processing of WSDL and WML. We will also enhance the simple design of our XPath engine to include support for predicates, functions, and different data types.
|
Related Reading
XML in a Nutshell, 2nd Edition |
Let's start with a simple query which will return the root node of any XML file:
./node()
We can take this further with another simple query, which selects all the immediate children of the root node:
./node()/*
What if you want to find all the nodes that are the immediate children of the
root node and have a type attribute? The following query will help:
./node()/*[attribute::type]
This query will return the binding element from Listing 1. This shows that the code
attribute::query written within square brackets acts as a
filter. Filters in XPath are called predicates and are written inside square
brackets. A predicate acts on a node-set -- in this example, the node-set
consists of all immediate children of the root node -- and applies the filtering
condition -- here: the node must have a type attribute -- to the node-set. The
result is a reduced, that is, filtered node-set.
Predicates can range from simple to very complex. Perhaps the simplest form of
XPath predicate is just a number as shown in the following query which returns
the second child (message element) of the root element:
./node()/*[2]
The query, ./node()/message[attribute::name="TotalBill"]/text()
will look for a particular message child of the root element whose
attribute name has a value TotalBill. The query will
return all text nodes of the particular message element. This query
will return the second of the two message elements of Listing 1.
Suppose you want to answer following questions about the WSDL file in Listing 1:
1. What is the value of the name attribute of last operation element?
2. How many message child elements does the definitions element have?
3. What is the name of the first child element of the root element?
last() Function The last() function will always point to the last node in the
node set. The following query, when applied to the WSDL file in Listing 1, will return the second
message element (i.e. the message element whose name
is TotalBill):
./node()/message[last()]
Note that the following query also returns the same message element:
./node()/message[2]
The only difference between the two queries is that we have replaced the
last() method with a number two (2). It is correct to conclude that
the last() function in this case is actually returning the number 2
(the number of nodes in the node set of the particular location step). Apply the
same two queries to the WSDL file of Listing 2 (you may use the XPath
Tester application mentioned in the resources) and you will see that this time
the two queries do not return the same result. There are three message elements
in Listing 2, so the
last() function is now returning the number 3.
Notice from this discussion that the last() function always returns
a number.
position() FunctionIf you apply the following queries to the WSDL file in Listing 2,
./node()/message[1]/part
./node()/message[2]/part
./node()/message[3]/part
they will return the part children of the first, second, and third
message elements respectively. This shows that there is a proximity
position of each node in the node set. The proximity position of the first node
is one, the second node is two and so on.
What if you want to find all the message elements except the
second? You can use the position() function which works on the
proximity position of a context node. The following query will return the first
and third message elements of Listing 2:
./node()/message[position()!=2]
The position() function simply returns the proximity position of
the context node being evaluated. The predicate [position()!=2]
will compare the proximity position with the number 2 and include the context
node in the node-set only if proximity position is not equal to two.
|
count() Function
How many message children does the portType element in
Listing 1 have? Count them and
you will find two message elements. Specifying a "how many"
question in XPath is a two-step procedure. First write an XPath query that will
find all those elements that you wish to count. Then pass the XPath query to
the count() function as shown below:
Step1: ./node()/message
Step 2: count(./node()/message)
The count() function calculates and returns the number of nodes in
the resulting node-set of the XPath query.
name(), local-name() and namespace-uri() FunctionsWhat does the following query return when applied to the WSDL file of Listing 1?
./node()/*[5]
It returns the fifth child (the service element) of the root
element. The service element itself is a complete structure and
contains child elements. Therefore, the returned value of this XPath query is
actually an XML node and not just the name of an element.
The name() function returns the name of the XML node in
question. For example, the following query will return the string "service" when
applied to Listing 1:
name(./node()/*[5])
Similarly, the following query will return the string "wsd:definitions" (fully qualified name of the root element with the namespace prefix):
name(./node())
The local-name() and namespace-uri() functions are
similar to the name() function, except that the local-name method
returns only the local name of the element without the namespace prefix, and the
namespace-uri function returns only the namespace URI. For example, try the
following queries on Listing 1:
local-name(./node())
namespace-uri(./node())
The first query returns a string "definitions", while the second returns "http://schemas.xmlsoap.org/wsdl/".
String Functions We have seen that the name(), local-name(), and
namespace-uri() functions return strings. XPath offers several
functions for the processing of strings, such as string(),
substring(), substring-before(),
substring-after(), concat(),
starts-with() etc. For example the following query demonstrates how
to use the string() function:
string(./node()/*[2]/part/attribute::name)
The above query will look for the second child of the root element, then it will
find all the part child elements of the root's second child. It will then look
for the name attribute of the part child elements, and, as a last step, it will
convert the value of the name attribute to a string form. When applied to Listing 1, it will yield
bill.
XPath also provides several functions that return true or false (Boolean data type). Consider the following query:
boolean(./node()/message)
It returns true when applied to Listing 1. That's because the
boolean() function checks whether a node-set resulting from an
XPath query is empty or not (in our case, it contains two message
children of the root element). If it is empty, the boolean()
function returns false, otherwise true.
The following WSDL processing scenario uses all the XPath concepts which we've discussed so far. The search requirement for the scenario is as follows:
Find aserviceelement which is a direct child ofdefinitions(root) element and whosenameattribute matches with thenameattribute of thedefinitionselement. Then look into thatserviceelement and find aportelement whosebindingattribute matches thenameattribute of abindingelement, which is a direct child of thedefinitions(root) element.
This WSDL processing can be fulfilled in four steps:
1. Find the value of the name attribute of the
definitions (root) element. The following XPath query (which
returns the string BillingService from Listing 1) performs this job:
string(//node()[1]/@name)
2. Then find the service element whose name attribute
matches the name of the definitions element. The
following query contains the query of point 1 in a predicate and will return
the required service element:
./node()[1]/service[@name=string(//node()[1]/@name)]
3. Then find the value of the name attribute of the
binding element:
string(//node()[1]/binding/@name)
4. Finally look for the required port element (whose
binding attribute matches the name of the
binding element of point 3) inside the service
element of point 2:
./node()[1]/service[@name=string(//node()[1]/@name)]
/port[@binding=string(//node()[1]/binding/@name)]
This example demonstrates that XPath predicates can contain simple logical conditions, function calls or even complete XPath queries.
WML is an XML language defined by the WAP Forum. WML provides a presentation format for small-device displays. WML is to a small-device display what HTML is to a personal computer.
Imagine a WML file consisting of a deck of cards, where each card is wrapped
by a card element. Listing 3 is a simple WML file that
contains two card elements.
The following XPath query will return all p (paragraph)
elements contained within the first card (the card element whose id
is "first") of Listing 3:
./node()/card[string(@id)="first"]/p
The next query returns the textual contents of the first paragraph of the second card:
string(./node()/card[string(@id)="second"]/p[1]/text())
We will now see how to include the support of predicates and Functions in the simple design of our XPath Engine.
The four pseudo-code classes XPathExpression (Listing 4),
XPathLocationStep (Listing 5),
XPathResult (Listing
6), and Predicate (Listing 7) form the updated design
that includes support of predicates and functions. We have introduced the
following enhancements to the classes presented in part 1:
1. XPath can return various types of data. Examples of data types XPath may
return include nodes, strings, numbers,
and Booleans. Our XPath engine design supported only XML nodes as
return data types. We have now provided a generic class named
XPathResult (Listing
6) to support the different data types. Implementations based on our design
will need to extend XPathResult for each data type separately.
2. The updated design now includes an architecture to support functions. A function call may occur at the beginning of an XPath query or inside any XPath location step. Therefore, both the XPathExpression Listing 4 and XPathLocationStep (Listing 5) classes now have added support for function calls.
3. We have provided a separate class for predicates (Listing 7). A predicate may consist of only a logical condition or an entire XPath query. Therefore, the Predicate class constructor will check whether the predicate is a complete query or just a condition. If it is a complete XPath query, the Predicate expression will instantiate a new XPathExpression object, otherwise it will just evaluate the logical condition to evaluate the filtered results.
In the preceding, we discussed the syntax and use of predicates and functions in XPath. We presented various WSDL and WML processing examples and demonstrated how to form complex XPath queries. Finally, we enhanced the design of the XPath engine introduced in the first article.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.