Introducing SPARQL: Querying the Semantic Web
by Leigh Dodds
|
Pages: 1, 2, 3, 4, 5
Graph Pattern Shortcuts
SPARQL includes a number of syntax shortcuts that simplify the writing of patterns. Let's rewrite our query more succinctly:
PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT *
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{
?element table:name ?name;
table:symbol ?symbol;
table:atomicNumber ?number.
}
We've used two shortcuts here. The first should be familar to SQL users: *. This shortcut means "return all variables listed in the graph pattern." It saves having to itemize every variable at the cost of relying on the processor to order the columns in the result set.
The second shortcut is, formally, the use of a predicate-object list. This shortcut allows a query author to list the subject of a series of triple patterns only once. When we're using this form, each triple pattern is terminated with a semicolon rather than a full stop. This shortcut can be used when several patterns share the same subject.
SPARQL offers a similar shortcut, an object list, which simplifies patterns that differ only in their subject.
OPTIONAL Patterns
RDF graphs are often semi-structured; some data may be unavailable or unknown. How do we allow for this when querying for data? Let's work through an example to illustrate the problem. Imagine that we wanted to adapt the previous query to also return the color of the element. Our first attempt may look like this:
PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name ?symbol ?number ?color
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{
?element table:name ?name.
?element table:symbol ?symbol.
?element table:atomicNumber ?number.
?element table:color ?color.
}
We've extended our SELECT statement to include the new variable, color, and have also added a match for the relevant property (table:color) to the graph pattern. So far, so good.
If you run this query though, you'll notice that some elements are missing. Ununtrium, for example. (No, I'd never heard of it either). If we look closely at the RDF data, we find that this ununtrium, and several other of the heavier elements, do not have the relevant table:color property. So these elements are not returned in the results.
We need to alter the query to allow for the fact that we have some missing or incomplete data. We achieve this by indicating that the relevant triple pattern is optional:
PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name ?symbol ?number ?color
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{
?element table:name ?name.
?element table:symbol ?symbol.
?element table:atomicNumber ?number.
OPTIONAL { ?element table:color ?color. }
}
If you run this version of the query you'll find that all of the elements are now correctly included. The OPTIONAL keyword must be followed by a sub-pattern containing the optional aspects of the query. Within the result set, if an element doesn't have a color property, then the color variable is said to be unbound for that particular solution (row).
Matching Alternatives with UNION
Now that we've seen how to explore optional data, let's see how we can select from alternatives. If we were interested in the chemistry of the halogens and the noble gases, we might simply construct and run separate queries in order to find out their atomic weights and CAS registry numbers.
But using the SPARQL UNION keyword we can write a single query that matches all of the elements. That query looks like this:
PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?symbol ?number
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
WHERE
{
{
?element table:symbol ?symbol;
table:atomicNumber ?number;
table:group table:group_17.
}
UNION
{
?element table:symbol ?symbol;
table:atomicNumber ?number;
table:group table:group_18.
}
}
There are a few things to notice. First, the query pattern consists of two nested patterns joined by the UNION keyword. If an element resource matches either of these patterns, then it will be included in the query solution. For clarity the patterns use the predicate-object list shortcut.
The query also includes another demonstration of URI shortening, this time within the object of a triple pattern. The value (range) of the table:group property is a resource. Each of the groups in the table is modeled as a resource with its own URI. The full URI for group 17 is http://www.daml.org/2003/01/periodictable/PeriodicTable#group_17. As we've already declared a URI PREFIX for http://www.daml.org/2003/01/periodictable/PeriodicTable# we can truncate this to table:group_17.
Any number of UNIONs can be included in a query, providing a great deal of flexibility in assembling data from alternatives.