XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Introducing SPARQL: Querying the Semantic Web

November 16, 2005

An Introduction to SPARQL

This tutorial, the first of a three-part series, introduces SPARQL -- a query language and data access protocol for the Semantic Web. SPARQL is defined in terms of the W3C's RDF data model and will work for any data source that can be mapped into RDF. The specification is under development by the RDF Data Access Working Group (DAWG) and has recently reached Last Call Working Draft.

At this point in its life cycle the specification is stable enough that developers can begin seriously exploring its capabilities. And the availability of several SPARQL query engines means that this exploration can be practical rather than theoretical.

But what if you're a lot more interested in Web 2.0, which is practical and real, than in the Semantic Web, about which opinions vary widely? Why might you want to go to the trouble of learning SPARQL? For dyed-in-the-wool Semantic Web fans, this question may well be a no-brainer: RDF has needed a standard query language for some time and having one will make many development tasks much easier.

However SPARQL has a much wider potential audience. A key aspect of the Web 2.0 idea is the ability to extract and query information held across many different ad hoc, third-party apps, services, or repositories. That ability to move in and among various data sources is key to the Web 2.0 idea of the mashup -- take a little Google Maps, salt with some eBay, and sprinkle with a heaping hunk of Flickr, right?

SPARQL, which is both a query language and a data access protocol, has the ability to become a key component in Web 2.0 applications: as a standard backed by a flexible data model, it can provide a common query mechanism for all Web 2.0 applications. XML.com managing editor Kendall Clark has published an excellent essay (Web 2.0 Meet The Semantic Web) that expands more fully on this idea. SPARQL should be of interest to developers exploring the available options for publishing open data on the Web.

The goal of these tutorials is to enable developers to quickly become productive with SPARQL. All of the key language features will be introduced with abundant examples. No previous experience with RDF query languages is required, but a basic familiarity with RDF and RDF/XML is essential. There are many good primers on RDF available for readers interested in a quick refresher course or a bottoms-up introduction.

This first tutorial introduces the key concepts in SPARQL and its relationships to the other specifications under development by the DAWG. By the end of the tutorial you'll be able to write some simple SPARQL queries to extract data from RDF.

In the second tutorial we'll cover some of the more advanced query options, including working with multiple data sources. That tutorial will also demonstrate the ease with which data can be merged and queried using SPARQL.

The third and final tutorial will introduce the other SPARQL query forms: CONSTRUCT, DESCRIBE, and ASK. Far from being limited to querying data, SPARQL also offers the ability to extract information from a data repository according to rules of the client's devising. Powerful stuff.

Before jumping into the syntax, let's put SPARQL into some context, and take a brief look at the data we'll be using throughout the series.

SPARQL in Context

Work on RDF query languages has been progressing for a number of years. Several different approaches have been tried, ranging from familiar looking SQL-style syntaxes, such as RDQL and Squish, through to path-based languages like Versa.

Of these approaches, those that emulate SQL syntactically have probably been the most popular and widely implemented. This is perhaps surprising given the very different models that lurk behind relational databases and RDF -- familiarity with syntax has no doubt contributed to this success. SPARQL follows this well-trodden path, offering a simple, reasonably familiar (to SQL users) SELECT query form which will be the main focus of this first article.

SPARQL actually consists of three separate specifications. The query language specification makes up the core. But alongside it sits the query results XML format which, as you might guess, describes an XML format for serializing the results of a SPARQL SELECT (and ASK) query. This simple format is easily processable with common XML tools such as XSLT; we'll look at an example of that later.

The third specification is the data access protocol which uses WSDL 2.0 to define simple HTTP and SOAP protocols for remotely querying RDF databases. (Or, cunningly, for querying any data repository that can be mapped to the RDF model). The XML results format is used to generate responses from services that implement this protocol.

In total, then, SPARQL consists of a query language, a means of conveying a query to a query processor service, and the XML format in which query results will be returned.

There are a number of issues that SPARQL does not address yet; most notably, SPARQL is read-only and cannot modify an RDF dataset. Work on this area is currently out of scope for the DAWG, as noted in Section 2 of their charter. It seems likely that this will become a later task for the Working Group once the initial specifications have reached Recommendation status. A similar strategy of "query first, update later" was also adopted by the XQuery Working Group.

Pages: 1, 2, 3, 4, 5

Next Pagearrow