A Relational View of the Semantic Web
by Andrew Newman
|
Pages: 1, 2, 3, 4
A formal definition of LEFT OUTER JOIN:
The left outer join of relations r and s is the outer union of the join of r and s and the antijoin of r and s. Or formally:
R1
R2 := (R1
R2)
(R1
R2).
Antijoin is composed of difference and semijoin. Semijoin is composed of join and project. The fully expanded version can therefore be expressed as:
R1
R2 := (R1
R2)
(R1 − (π(R1) (R1
R2)))
Where: "−" denotes difference and "π" denotes project.
The use of antijoin is significant from the point of view of distributing the queries efficiently across multiple sites, something that is important in SPARQL implementations. The difference and project operations are the standard relational versions. Table 12 displays the results of performing a left outer join with relations r and s from Tables 8 and 9. Left outer join is order dependent, if the left outer join of s and r are performed the result the last relation will have the name "George" not "Blake".
|
SNO |
sno |
SNAME |
name |
STATUS |
integer |
CITY |
char |
|
S1 |
"Smith" |
20 |
"London" |
||||
|
S2 |
"Jones" |
10 |
|||||
|
S3 |
"Blake" |
||||||
Table 12. Result of Left Outer Join of r and s
Another operation defined by Galindo-Lagaria is the minimum union operator (⊕), which has the same effect as performing outer union with the results of the antijoin of r and s.
A formal definition of MINIMUM UNION:
The minimum union of relations r and s is the outer union of r and s followed by removing subsumed tuples. Tuple subsumption is defined as t1 subsumes t2 if t1 has more values that are bound than t2 and that the values in t2 that are bound are equal to t1. The removal of subsumed tuples in R is denoted as R ↓.
Table 13 shows the result of minimum union performed of relations r and s from Tables 8 and 9.
|
SNO |
sno |
SNAME |
name |
STATUS |
integer |
CITY |
char |
|
S1 |
"Smith" |
20 |
"London" |
||||
|
S2 |
"Jones" |
10 |
|||||
|
S3 |
"Blake" |
||||||
|
S3 |
"George" |
||||||
Table 13. Result of Minimum Union of r and s
Another definition of LEFT OUTER JOIN can then be given using minimum union:
R1
R2 := R1
R2 ⊕ R1
The result returns the same results as given in Table 12 and has the advantage over the previous definition in that it requires fewer operations.
Bagging SPARQL
The use of the relational model to query RDF provides lessons that have yet to be applied to the design of SPARQL. One of the main criticisms that can be leveled at SPARQL is its use of multisets (bags) – SPARQL has a DISTINCT operator that removes duplicates. RDF is set based. It is often seen as a good property of query languages to retain the same data model, to be consistent, this increases the easy of use and the ease of implementation.
In SQL, one of the uses of duplicates is to provide a way to perform aggregate functions. That is, being able to ask questions such as: "What is the sum of all salaries?" (using "SELECT SUM(salaries)…"). This query is typically performed on a table representing employees and their salaries within an organization's database. Using set-based semantics the same query only returns the distinct salary values to be totaled, not all of them. To get this query to work using a set-based query language a distinct entity, such as an employee, is required in combination with their salary in order to get the desired result.
The use of a set-based language requires that the results be paired with their relevant contextual information such as the combination of employee, salary and organization. This contextual information becomes vital when the query is performed on the larger web of data. Asking the entire web for the sum of salaries is unlikely to return the results required. The query has to include this contextual information so that salaries, for employees, employed by a specific organization or other group is retained. These are the parts of the query that are usually implicit locally which will need to be made explicit globally. Using consistent set-based semantics will retain this context and allow a query to return results correctly irrespective of what it is being queried against.
Another issue is one of answer closure. Closure allows the outputs of a function to be used as the inputs to the next. Currently, the results of a SPARQL SELECT query cannot be used as input for further querying. While SPARQL provides a CONSTRUCT query to return an RDF graph it is a new graph (new blank nodes are generated, for example) and is not restricted to only returning statements from the original. When querying a web of data it is useful to be able to feed the result of one query into another with each query being re-executed as needed. Ideally, the assignment of a variable to the result of a SPARQL SELECT query could be used within the SPARQL query language much like Date's relvar [4]. This provides a way to build up more powerful queries based on others and is another way to dynamically provide context that subsequent queries can be performed against.
Conclusion
One of the goals of the Semantic Web is to be able to achieve querying of disparate data sources across the web. The proposed standard for querying the Semantic Web, SPARQL, can be seen as an extension of an existing formalization, the relational model. The use of the relational model provides a way to use previous work in query distribution, optimization, and formulation. The standard relational model is not sufficient, however, and must be extended to support untyped relations and operations in order to integrate these data sources.
Bibliography
[1] T. Berners-Lee, Weaving the Web, Orion Publishing Group, Ltd, London, United Kingdom, 1999, pp 201.
[2] T. Berners-Lee, Relational Databases on the Semantic Web, 1998; http://www.w3.org/DesignIssues/RDB-RDF.html
[3] R. Cyganiak, A Relational Algebra for SPARQL, Digital Media Systems Laboratory, HP Laboratories Bristol, Tech. Rep, HP Laboratories Bristol, Tech. Rep, 2005; http://www.hpl.hp.com/techreports/2005/HPL-2005-170.html
[4] C. J. Date, Database in Depth, Relational Theory for Practitioners, O'Reilly Media, Inc, Sebastopol, California, 2005, pp. 11, 17-20, 86-93.
[5] C. J. Date, Relational Database Writing 1991-1994, Addison Wesley Publishing Company, Inc, Reading, MA, 1995, pp. 341-362.
[6] C. Galindo-Legarai, "Outerjoins as Disjunctions," Proceedings of the 1994 ACM-SIGMOD Int. Conference on Management of Data, 1994, pp. 348-358.
[7] P. Hayes, RDF Semantics, World Wide Web Consortium (W3C) Recommendation, 2004; http://www.w3.org/TR/rdf-mt/
[8] E. Prud'hommeaux, and A. Seaborne,SPARQL Query Language for RDF, World Wide Web Consortium (W3C) Candidate Recommendation, 2006; http://www.w3.org/TR/2006/CR-rdf-sparql-query-20060406/
- Los Angeles Locksmith 323-678-2704 Los Angeles Locksmith
2010-06-16 12:29:22 carpetcare - Lovely article
2007-03-17 05:42:48 commonground