Versa: Path-Based RDF Query Language
As interest in a standardized RDF query language reaches a feverish pitch, there are some fundamental approaches and patterns that are noteworthy. Versa is an RDF query language at the opposing end of the mainstream of RDF query languages, many of which are influenced, at least syntactically, by SQL. Versa's design was motivated primarily by XPath. This article discusses the fundamentals of Versa with an eye to emphasize and augment similarities with XPath and the aspects of SPARQL that are directly relevant.
Versa is a query language designed for the specific purpose of extracting information from an RDF graph in a very modular way. A Versa query facilitates the isolation of resources, and their associated property values through specific patterns and constraints as specified by a Versa expression (somewhat reminiscent of XPath expressions and regular expressions). A Versa query is performed by submitting a Versa expression to a Versa query processor associated with an RDF graph from which the user wishes to extract information. A Versa expression is one of the following:
- Nested (parenthesized) expressions
- Literals (instances of Versa data types)
- Variable References
- Particles (syntactic sugar): "*" or "."
- Traversal Expressions
- Function Calls
Versa models RDF literal data with the following set of data types (any of which can be returned as the result of evaluating a Versa expression).
A Versa resource is a string that represents the URI (Universal Resource Identifier) of an RDF resource in the underlying RDF graph. A Versa resource is expressed in one of two forms: as a QName (defined by the XML Namespaces specification) or a fully expressed URI as a quoted string preceded by the "u" character. This is somewhat analogous to Notation 3's use of QNames and "<" , ">". Versa does not distinguish between regular resources and blank nodes in query expressions. Blank nodes can be addressed the same as any other. Though this may seem counterintuitive to their impermanent nature, there are often real world scenarios where addressability is more important than anonymity, especially where response time is a factor.
A Versa string is simply a sequence of zero or more characters. Strings are expressed within Versa by enclosing them with single or double quotes. In order to allow the quote characters themselves to be included within a string, Versa provides a means to escape quotes by using the "\" character. The following is an example of using the "\" character to allow the inclusion of a quote character within a string: "This document\'s subject is Versa"
A Versa number is a literal that represents a numerical value.
A boolean represents a literal value of "true" or "false." The character "*" is provided as the short form for "true."
A Versa set follows from the definition of a set in the mathematical sense: a collection of distinct elements having specific common properties. Members of a set consist of: literals, resources, (other) sets, and lists.
A Versa list is a collection of elements (not necessarily distinct) which can be any one of: literals, resources, (other lists), and sets.
Often, Versa expressions are evaluated with respect to a context. A Versa context is related to an XPath context (and is comprised of similar parts). As a refresher, an XPath context consists of:
- a node (the context node)
- a pair of nonzero positive integers (the context position and the context size)
- a set of variable bindings
- a function library
- the set of namespace declarations in scope for the expression
The formal definition of a Versa Context is below:
Many Versa constructs are evaluated with regard to a context. The context is a value of any data type, and it can always be referred to explicitly in an expression using the token ".".
You can think of a Versa context as an XPath context with a literal value instead of a context node, without a position and size, and associated with a named graph or the entire underlying RDF model. This concept of querying within a named graph is well expressed in the SPARQL specification [8.2]. In the 4Suite implementation, the name of the current graph is the context's scope. The term scope is used consistently throughout 4Suite to refer to a named graph context associated with an RDF statement, in a manner similar to most RDF data stores (rdflib and Redland, for instance).