XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

A Relational View of the Semantic Web
by Andrew Newman | Pages: 1, 2, 3, 4

An alternative mapping could take a primary key (SNO is a likely candidate) as being the center of all the values. However, this limits the possibility of representing suppliers without a known supplier number or duplicate rows (ones that typically occur in SQL tables that don't have uniqueness constraints applied). Representing a supplier without a required attribute may not seem initially sensible for those used to creating data models in closed environments. However, on the Web or any large distributed system, agreement of what is a required attribute may not be able to be reached ahead of time or perhaps an authority required to create unique identifiers may not be reachable at the time the data is stored. Similarly, detecting duplicates is something that may have to occur after the data is recorded.

This choice between a blank node or unique identifier is similar to the surrogate vs. natural key in relational databases. The difference is that blank nodes cannot be searched on by value in the same way a numeric surrogate key can. The advantage is that blank nodes can be created locally and distributed globally without requiring an authority to generate them.

An RDF graph is a lot less structured than the given typical relational table, but it still has a fixed structure of the RDF statement (subject, predicate, and object). Because this structure is fixed, it's therefore possible to represent it relationally. This is given in Table 2 using the data represented in Table 1 and Figure 1.

s1

subject

p1

predicate

o1

Object

_1

#sno

"S1"^^#sno

_1

#sname

"Smith"^^#name

_1

#status

"20"^^#integer

_1

#city

"London"^^#char

_2

#sno

"S2"^^#sno

_2

#sname

"Jones"^^#name

_2

#status

"10"^^#integer

_2

#city

"Paris"^^#char

_3

#sno

"S3"^^#sno

_3

#sname

"Jones"^^name

_3

#status

"30"^^#integer

_3

#city

"Paris"^^#char

Table 2. The Supplier Data as RDF Triples in a Relation

The types of the columns are RDF's node types: subject, predicate, and object and are named "s1", "p1", and "o1" respectively. An RDF subject can be a blank node or URI, a predicate a URI and an object can be an URI, blank node, or literal. The use of hashes ("#") is merely a convention used to represent URIs with a namespace that is unimportant and literal values are composed of a value and a datatype (which are also URIs) that is preceded by two carets ("^^"). So the literals "Smith" and "20" are of type "name" and "integer" respectively.

This view of RDF as a relational structure is not that unique and was described in the early stages of RDF's development by Tim Berners-Lee [2].

RDF without NULL

RDF does not have the concept of a NULL value. Similarly, the relational model as defined by Date dismisses the need for a NULL value too. RDF can be stored using this version of the relational model and hence NULL values can be avoided.

This is best demonstrated by looking at the data from Table 2 and considering what if supplier S2 and S3 didn't have a status and S3 also lacked a city. What would a flexible view be of the data look like if you didn't need to worry about agreeing on one table structure and didn't use NULLs? Tables 3, 4, and 5 shows three relations each with a different number of columns (different types) and Table 6 shows the merging of these relations into one, as an untyped relation. An untyped relation is a relation that contains a set of tuples that can contain a subset of values bound to the heading's attributes. To return the untyped relation to a typed relation a simple project on the required columns can be performed. There are no NULLs -- there are tuples that contain sets of values that are unbound or don't return a value for the given column (attribute).

SNO

sno

SNAME

Name

STATUS

Integer

CITY

char

S1

"Smith"

20

"London"

Table 3. Suppliers with a name, status and city.

SNO

sno

SNAME

name

STATUS

integer

S1

"Smith"

20

S2

"Jones"

10

Table 4. Suppliers with a name and status.

SNO

sno

SNAME

name

S1

"Smith"

S2

"Jones"

S3

"Blake"

Table 5. Suppliers with a name.

SNO

sno

SNAME

name

STATUS

integer

CITY

char

S1

"Smith"

20

"London"

S2

"Jones"

10

 

S3

"Blake"

Table 6. Example of a Supplier Table

As shown in Tables 3-6 relations of different types can be represented by a single untyped relation. While this may seem like a shift away from the traditional relational approach it is actually just a convenient way of representing relations of different types in one data structure. This is especially useful when relations of different types are expected to occur frequently when integrating data from different sources such as those found in the Semantic Web. The traditional approach to relations and relational algebra can still be used but it requires many equally typed relations to be used both as input to operations and as their outputs. The use of untyped relations reduces the total number of relations to be handled and with the use of modified relational operations allows processing to be performed once over these untyped relations. For example, the supplier table given in Table 6 when joined with a parts table would require three operations and results. In an untyped system, only a single operation is performed producing a single untyped relation.

Pages: 1, 2, 3, 4

Next Pagearrow