XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

What Is RDF
by Joshua Tauberer | Pages: 1, 2, 3

Triples for Knowledge

RDF provides a general, flexible method to decompose any knowledge into small pieces, called triples, with some rules about the semantics (meaning) of those pieces.

The foundation is breaking knowledge down into a labeled, directed graph. Each edge in the graph represents a fact, or a relation between two things. The edge in the example from the node vincent_donofrio labeled starred_in to the node the_thirteenth_floor represents the fact that actor Vincent D'Onofrio starred in the movie "The Thirteenth Floor." A fact represented this way has three parts: a subject, a predicate (i.e., verb), and an object. The subject is what's at the start of the edge, the predicate is the type of edge (its label), and the object is what's at the end of the edge. (Technically RDF can express some things that a graph can't, but I won't get into that here.)

The six documents composing the RDF specification tell us two things. First, it outlines the abstract model, i.e., how to use triples to represent knowledge about the world. Second, it describes how to encode those triples in XML.

Most of the abstract model of RDF comes down to four simple rules:

  1. A fact is expressed as a Subject-Predicate-Object triple, also known as a statement. It's like a little English sentence.
  2. Subjects, predicates, and objects are given as names for entities, also called resources (dating back to RDF's application to metadata for web resources) or nodes (from graph terminology). Entities represent something, a person, website, or something more abstract like states and relations.
  3. Names are URIs, which are global in scope, always referring to the same entity in any RDF document in which they appear.
  4. Objects can also be given as text values, called literal values, which may or may not be typed using XML Schema datatypes.

You've seen statements already. Each row in the triples table above, or in the example N3 file, was a fact. This satisfies our need for being able to represent knowledge as a graph.

Entities are named by Uniform Resource Identifiers (URIs), and this provides the globally unique, distributed naming system we need for distributed knowledge. URIs can have the same syntax or format as website addresses (URLs), so you will see RDF files that contain URIs, such as http://www.w3.org/1999/02/22-rdf-syntax-ns#type. The fact that it looks like a web address is totally incidental. There may or may not be an actual website at that address, and it doesn't matter for RDF--it is just a very verbose identifier. (Although sometimes there is something useful at the address.) There are also other types of URIs besides http: URIs, such as URNs and TAGs, which you'll see below. URIs are used as global names because they provide a way to break down the space of all possible names into units that have obvious owners. URIs that start with http://www.rdfabout.com/ are implicitly controlled by me because I own and control the domain, "rdfabout.com."

Since URIs can be quite long, in RDF notations they're usually abbreviated using the concept of namespaces from XML.

Literal values, like "computer science," allow text to be included in RDF. This is used heavily when RDF is used for metadata--its original purpose. In fact, literal values are primarily what tie RDF to the real world, since URIs are just arbitrary strings.

These concepts form most of the abstract RDF model for encoding knowledge. It's analogous to the common API that most XML libraries provide. If it weren't for us curious humans always peeking into files, the actual format of XML wouldn't matter so much as long as we had our appendChild, setAttribute, etc. Of course, we do need a common file format for exchanging data, and in fact there are two for RDF, which we look at in the next section.

Serialization Syntaxes: XML and Notation 3

In the previous section we covered the abstract RDF model. Now we turn to how actually to write RDF in two formats. The W3C specifications define an XML format to encode RDF. Here's an example:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:geo="http://www. w3.org/2003/01/geo/wgs84_pos#"
    xmlns:edu="http://www.example.org/">
    <rdf:Description rdf:about="http://www.princeton.edu">
        <geo:lat>40.35</geo:lat>
        <geo:long>-74.66</geo:long>
        <edu:hasDept rdf:resource="http://www.cs.princeton.edu"
            dc:title="Department of Computer Science"/>
    </rdf:Description>
</rdf:RDF>

In an RDF/XML document there are two types of nodes: resource nodes and property nodes. Resource nodes are the subjects and objects of statements, and they usually have an rdf:about attribute on them giving the URI of the resource they represent. In this example, the rdf:Description node is the only resource node.

Resource nodes contain (only) property nodes, which represent statements. There are three statements in this example, all with the subject <http://www.princeton.edu>, and with the predicates geo:lat, geo:long, and edu:hasDept.

Property nodes, in turn, contain literal values, like "40.35" and "-74.66," or a reference to an object resource using the rdf:resource attribute, or they may contain a full resource node as their object.

From the specification we are told how to take the XML document above and get out of it this table of statements:

            Subject            Predicate              Object
----------------------------- ----------- ------------------------
<http://www.princeton.edu>    edu:hasDept <http://www.cs.princeton.edu>
<http://www.princeton.edu>    geo:lat     "40.35"
<http://www.princeton.edu>    geo:long    "-74.66"
<http://www.cs.princeton.edu> dc:title    "Department of Computer Science"

These triples are the bread and butter of RDF. When applications use RDF in XML format, they see the triples. Note that the hierarchical structure of the XML and the order of the nodes is lost in the table of triples, which means that, like whitespace, it was not a part of the information meant to be encoded in the RDF.

Notation 3 (N3), or Turtle, is another system for writing out RDF. Since it works under the same abstract model, the difference between it and RDF/XML is superficial--readability.

The same information in the RDF/XML file written in N3 looks like this:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix geo: <http://www. w3.org/2003/01/geo/wgs84_pos#> .
@prefix edu: <http://www.example.org/> .

<http://www.princeton.edu> geo:lat "40.35" ; geo:long "-74.66" .
<http://www.cs.princeton.edu> dc:title "Department of Computer Science" .
<http://www.princeton.edu> edu:hasDept <http://www.cs.princeton.edu> .

In N3 and Turtle, statements are just written out as the subject URI (in brackets or abbreviated with namespaces), followed by the predicate URI, followed by the object URI or literal value, followed by a period. But, to save on typing, multiple statements with the same subject can be grouped together by using a semicolon and omitting the subject a second time. The semicolon on the first line indicates <http://www.princeton.edu> is the subject of both the geo:lat and geo:long predicates.

Pages: 1, 2, 3

Next Pagearrow