XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Introducing SPARQL: Querying the Semantic Web
by Leigh Dodds | Pages: 1, 2, 3, 4, 5

SPARQL Query Tools

Happily the SPARQL specifications don't exist in isolation. There are several tools and APIs that already provide SPARQL functionality, and most of them are up to date with the latest specifications. A brief list includes:

  • ARQ, a SPARQL processor for Jena
  • Rasqal, the RDF query library included in Dave Beckett's comprehensive Redland framework
  • RDF::Query
  • twinql, a SPARQL processor for Lisp written by Richard Newman
  • Pellet, an open source OWL DL reasoner in Java, that has partial SPARQL query support
  • KAON2, another OWL DL reasoner that has partial SPARQL support.

My SPARQL query tool Twinkle offers a simple GUI interface to the ARQ library, and supports multiple output formats and simple facilities for loading, editing, and saving queries. Handy if you want to play with SPARQL on the desktop. But for a minimum of installation fuss you can't beat an online SPARQL query tool, which we'll use throughout the rest of the tutorials. As it happens, the service is also a self-contained example of the SPARQL protocol in action.

The Periodic Table in RDF

Tutorial writers can burn a lot of time crafting a good set of examples. A balance needs to be struck between making the data clear versus making it too trivial. What you really want is for the examples to reflect the power of the technology being introduced. For this series, I'm going to dispense with the art of data design and instead pick up some data already published wild on the Web. That is, we're doing real RDF processing of real-world data. Not only will this help illustrate SPARQL's utility, we may even learn a few interesting facts along the way.

Bob DuCharme has done an excellent job of curating public collections of RDF on his site rdfdata.org. I've picked out this RDF representation of the periodic table for our purposes. It's data that most people will have at least a passing familiarity with, so won't take a great deal of review in order for you to get started. Here's a handy periodic table to use as a reference if your chemistry is a little rusty.

The RDF data provides some essential facts about each element including its name, symbol, atomic weight and number, plus a good deal more. We'll focus on these simple properties for now. A slightly edited extract of the data, showing a description of sodium, is included here:

<Element rdf:ID="Na" xmlns="http://www.daml.org/2003/01/periodictable/PeriodicTable#">
    <name>sodium</name>
    <symbol>Na</symbol>
    <atomicNumber>11</atomicNumber>
    <atomicWeight>22.989770</atomicWeight>
    <group rdf:resource="#group_1"/>
    <period rdf:resource="#period_3"/>
    <block rdf:resource="#s-block"/>
    <standardState rdf:resource="#solid"/>
    <color>silvery white</color>
    <classification rdf:resource="#Metallic"/>
    <casRegistryID>7440-23-5</casRegistryID>
</Element>

Note that the namespace for this data is http://www.daml.org/2003/01/periodictable/PeriodicTable# -- that'll be important when we start formulating our SPARQL queries. The RDF includes a mixture of properties; some are simple literals such as name and atomicWeight, while others such as group and standardState have resources as values.

Introducing the Triple Pattern

RDF is built on the triple, a 3-tuple consisting of subject, predicate, and object. Likewise SPARQL is built on the triple pattern, which also consists of a subject, predicate and object. In fact an RDF triple is also a SPARQL triple pattern. A triple from our data expressed using the SPARQL triple pattern syntax looks like this:

<http://www.daml.org/2003/01/periodictable/PeriodicTable#Na> table:name "sodium".

A triple pattern is written as subject, predicate, and object and is terminated with a full stop. URIs, e.g. for identifying resources, are written inside angle brackets. Literal strings are denoted with either double or single quotes. While properties, like name, can be identified by their URI, it's more usual to use a qname-style syntax to improve readability. Later in the tutorial I'll show you how to associate a prefix with a URI using a mechanism very similar to XML namespaces.

SPARQL specifies a number of handy abbreviations for writing complex triple patterns. Both the basic syntax and abbreviations borrow heavily from Turtle, a very terse RDF serialization alternative to RDF/XML. As a text rather than XML format, Turtle can be used to express RDF very succinctly. Rather than exhaustively list all of the SPARQL syntax shortcuts here, we'll introduce them throughout the examples contained in this and later tutorials.

The triple pattern above is fine for demonstrating syntax but isn't very useful as a query. If we know all the data, there's no need to run a query. However, unlike a triple, a triple pattern can include variables. Any or all of the subject, predicate, and object values in a triple pattern may be replaced by a variable. Variables are used to indicate data items of interest that will be returned by a query. The next example shows a pattern that uses variables in place of both the subject and the object:

?element table:name ?name.

Since a variable (which has in SPARQL an alternative spelling using the $ character, like $element) matches any value, this pattern will match any RDF resource that has a name property. Each triple that matches the pattern will bind an actual value from the RDF dataset to each of the variables. For example, there is a binding of this pattern to our dataset where the element variable is bound to <http://www.daml.org/2003/01/periodictable/PeriodicTable#Cl and the name variable is "chlorine."

In SPARQL all possible bindings are considered, so if a resource has multiple instances of a given property, then multiple bindings will be found. Which is a good thing to remember if you end up with more data than expected in your query results.

At this point you may be wondering if it's legal for a triple pattern to include only variables. Well, it is:

?subject ?predicate ?object.

This pattern matches all triples in an RDF graph.

Triple patterns can also be combined to describe more complex patterns, known as graph patterns. These will be clearer when seen within the context of some sample queries. So let's look at the basic structure of our first SPARQL query.

Pages: 1, 2, 3, 4, 5

Next Pagearrow