XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Introducing SPARQL: Querying the Semantic Web
by Leigh Dodds | Pages: 1, 2, 3, 4, 5

Structure of a Query

This SPARQL query selects the names of all the elements in the periodic table:

PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>

SELECT ?name
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>

WHERE { ?element table:name ?name. }

Let's break down the query into its parts to better understand the syntax.

Starting from the top we encounter the PREFIX keyword. PREFIX is essentially the SPARQL equivalent of declaring an XML namespace: it associates a short label with a specific URI. And, just like a namespace declaration, the label applied carries no particular meaning. It's just a label. A query can include any number of PREFIX statements. The label assigned to a URI can be used anywhere in a query in place of the URI itself; for example, within a triple pattern. In the single triple pattern included in this query we can see the table prefix in use as a shorthand for http://www.daml.org/2003/01/periodictable/PeriodicTable#name, the full URI of the name property.

The start of the query proper is the SELECT keyword. Like its twin in a SQL query, the SELECT clause is used to define the data items that will be returned by a query. In Example 6 we're returning a single item, the name of the element.

As you might expect, the FROM keyword identifies the data against which the query will be run. In this instance, the query references the URI of the periodic table in RDF. A query may actually include multiple FROM keywords, as a means to assemble larger RDF graphs for querying. We'll have more to say about that (and SPARQL datasets in general) in the next tutorial. For now, think of all the lovely mashups . . .

Finally, we have the WHERE clause. A graph pattern is a collection of triple patterns that identify the shape of the graph that we want to match against. In this instance you'll recognize the pattern for this query as the triple pattern we used earlier.

The WHERE keyword is actually optional and can legally be omitted to make queries slightly terser:

BASE <http://www.daml.org/2003/01/periodictable/>
PREFIX table: <PeriodicTable#>

SELECT ?name
FROM <PeriodicTable.owl>
{ ?element table:name ?name. }

URIs are often long and unwieldly, and you can never have too much syntactic sugar to help avoid typing them out repeatedly. BASE is another form of URI abbreviation, defining the base URI against which all relative URIs in the query will be resolved, including those defined with PREFIX. As you can see, the common prefixes of the two URIs in the previous example have been factored out into a BASE URI declaration.

Now that we've written a complete query, let's run it and get some results.

Our First Results

Here's a table that lists the first few results (you can view the complete results using the online query tool):

row name
1 sodium
2 neon
3 iron

The result of a SPARQL SELECT query is a sequence of results that, conceptually, form a table or result set. Each row in the table corresponds to one query solution. And each column corresponds to a variable declared in the SELECT clause. If you've done any kind of database development, this kind of table-oriented result set should be immediately familiar.

In later sections we'll look at how that sequence can be modified, e.g. to apply a sort order, limit the number of returned results, etc. We'll also take a quick look at the XML results format. But for now, let's make the query to do something more interesting.

Graph Patterns

Taking what we've learned about the simplest kind of triple patterns and the structure of a SPARQL query, we can now explore how to do more complex and useful queries.

The next example shows a query that selects the name, symbol, and atomic number of all elements in the periodic table:

PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name ?symbol ?number
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{
  ?element table:name ?name.
  ?element table:symbol ?symbol.
  ?element table:atomicNumber ?number.
}

What's new here is that the query pattern consists of multiple triple patterns. A collection of triple patterns is a graph pattern. In this instance the graph pattern consists of three triple patterns, one to match each of the desired properties: name, symbol, and atomicNumber. Understanding how this query operates involves a bit more background on the pattern matching process.

The most important point is that within a graph pattern a variable must have the same value no matter where it is used. So in the previous example the variable element will always be bound to the same resource. In other words, this query will match any resource that has all three of the desired properties. A resource that does not contain all of these properties will not be included in the results because it won't satisfy all of the triple patterns. We'll cover optional matching in a later section.

The other notable item here is that there is one triple pattern for each of the variables required to be present in the result set. In SPARQL one cannot SELECT a variable if it is not listed in the graph pattern. This may seem slightly odd if you're only used to SQL; in that language it is quite common to return variables that are not listed in a WHERE clause. But remember a SPARQL query processor has no data dictionary that lists all columns (i.e. properties) of a resource. Variables must be bound to an RDF term via a triple pattern in order for the processor to be able to extract that term from the graph.

Pages: 1, 2, 3, 4, 5

Next Pagearrow