XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Introducing SPARQL: Querying the Semantic Web
by Leigh Dodds | Pages: 1, 2, 3, 4, 5

Sorting

With all of the examples we've seen so far, we've been content to let the results be returned in whatever order the query engine chooses. This is rarely desirable in practice, as we commonly need to impose some sensible and relevant ordering to the data.

SPARQL offers the ORDER BY clause to let us do precisely that. The next example demonstrates the new syntax:

PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name ?number
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{
  ?element table:name ?name;
           table:atomicNumber ?number;
           table:group table:group_18.
}
ORDER BY ?number

This example selects the name and atomicNumber of all of the elements in group 18 of the periodic table, the noble gases. The ORDER BY clause indicates that the elements should be ordered by their atomic number property, in ascending order.

Formally, ORDER BY is a solution sequence modifier -- it manipulates the result set prior to it being returned by the query processor. As such, it is not part of the graph pattern and so is listed after the WHERE clause in the query syntax.

An ORDER BY clause can list one or more variable names, indicating the variables that should be used to order the result set. The query processor will sort by each variable in turn, in order of their declaration. By default all sorting is done in ascending order, but this can be explicitly changed using the DESC (descending) and ASC (ascending) functions. The next example sorts all of the elements in the periodic table in descending order of atomic weight:

PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{
  ?element table:name ?name;
           table:atomicWeight ?weight.
}
ORDER BY DESC(?weight)

SPARQL also allows us to limit the total number of results in a result set using the LIMIT keyword, which indicates the maximum number of rows that should be returned. A value of zero will return no results; if the value is greater than the size of the result set, then all rows will be returned. Used in combination with ORDER BY we can modify our query to create a new query that returns the ten heaviest elements in the periodic table:

PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{
  ?element table:name ?name;
           table:atomicWeight ?weight.
}
ORDER BY DESC(?weight)
LIMIT 10

When building user interfaces to navigate through a database or set of results, it's common to break the results into pages, e.g. displaying 10 search results at a time. SPARQL supports such paging by allowing a query to specify an OFFSET into the result set. This indicates that the processor should skip a fixed number of rows before constructing the result set. This usage is naturally combined with ORDER BY in order to ensure a consistent and meaningful order. By way of example, let's assume that we've already listed the ten heaviest elements in the periodic table and now want to fetch the next ten heaviest. In this query we use OFFSET to skip the data we've already seen:

PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{
  ?element table:name ?name;
           table:atomicWeight ?weight.
}
ORDER BY DESC(?weight)
LIMIT 10
OFFSET 10

SPARQL Query Results XML Format

For readability the examples we've viewed so far have been rendered as HTML tables. Most SPARQL processors will include a custom API to allow the direct manipulation of a result set, allowing a programmer to manipulate results in whatever way suits an application. But if we want to serialize a SPARQL result set in a standard way, perhaps to return data via a web service, we can use the SPARQL Query Results XML Format.

By way of an example, here's an extract of the results from the first example above. To view the complete set of results, refer to the online service:

<sparql xmlns="http://www.w3.org/2005/sparql-results#">
  <head>
    <variable name="name"/>
  </head>
  <results ordered="false" distinct="false">
    <result>
      <binding name="name"><literal datatype="http://www.w3.org/2001/XMLSchema#string">sodium</literal></binding>
    </result>
    <result>
      <binding name="name"><literal datatype="http://www.w3.org/2001/XMLSchema#string">neon</literal></binding>
    </result>
    <result>
      <binding name="name"><literal datatype="http://www.w3.org/2001/XMLSchema#string">iron</literal></binding>
    </result>
   <!-- more results -->
   </results>
</sparql>

As you can see, the format is fairly simple and regular:

  • All of the key elements belong to a single namespace, http://www.w3.org/2005/sparql-results#
  • The root element is sparql, which contains a head and a results element that together describe the result set
  • The head section declares all variables that will be returned in the result set. It's equivalent to the column headings in an HTML table
  • The results section lists each query result, i.e. one result element for each row in the result set
  • A result element contains one binding for each variable. A binding is one of literal or uri. These elements contain the actual values returned. If a variable is not bound in a query (see the above section on OPTIONAL Patterns), then it is marked as unbound.

Given its obvious simplicity and regular structure, manipulating this format with XSLT or XQuery is fairly trivial. The SPARQL Query Results XML Format specification includes several relevant examples.

Summary

This brings us to the end of our first look at SPARQL.

We've seen how SPARQL allows us to match patterns in an RDF graph using triple patterns, which are like triples except they may contain variables in place of concrete values. The variables are used as "wildcards" to match RDF terms in the dataset.

We introduced the SELECT query which can be used to extract data from an RDF graph, returning it as a tabular result set. We built up more complex graph patterns from simple triple patterns and illustrated how to deal with both required and OPTIONAL data. UNION queries were also introduced as a way of dealing with selecting alternatives from our dataset. Finally, we demonstrated how to apply ordering to our results, LIMIT the amount of data returned, and jump forward through results using OFFSET.

Along the way we took a brief look at the SPARQL XML Query Results Format, and a number of the syntax shortcuts that make writing queries much simpler. These are especially useful with repetitive graph patterns and long URIs.

Armed with this information, and the growing range of SPARQL implementations, you can start to investigate the language yourself and put it to good use. As you begin working with the language you'll no doubt find Dave Beckett's query language reference a handy resource.

In our next tutorial in this series we'll look more closely at how SPARQL deals with data typing, applying constraints to our data, and the facilities for querying data from multiple sources.

Finally, I'd like to thank Katie Portwin and Priya Parvatikar for early feedback on this article.



1 to 7 of 7

  1. 2010-06-23 10:28:35 Chicago Mover
  2. Excellent tutorial
    2010-04-30 12:07:31 SemanticArchitecture
  3. Well structured and thought article
    2010-04-16 15:11:12 Muchemi
  4. Introducing Sparql - Parts 2 and 3.
    2007-11-20 18:00:41 Dictionary_of_Sydney

  5. 2006-12-03 14:01:37 abdul-ontology
  6. XML Army Knife - SPARQL Query Form
    2005-11-22 12:40:18 nathanmcfarland
  7. Very well written for new learners
    2005-11-18 00:34:09 putchavn
1 to 7 of 7