What Is RDF
Editor's Note: "What Is RDF" was originally written by Tim Bray in 1998 and updated by Dan Brickley in 2001. Recently it seemed like time for another update, particularly to relate RDF and the Semantic Web to the cutting edge of web development. We've republished the original in a new location and offer the following update. I'll leave to you, dear reader, the task of deciding how well Joshua Tauberer has accomplished the task of updating a classic. -- Kendall Grant Clark
Building the Semantic Web
On the Semantic Web (SemWeb), computers do the browsing (and searching, and querying, and...) for us. The SemWeb enables computers to seek out knowledge distributed throughout the Web, mesh it, and then take action based on it. Take an analogy: the current web is a decentralized platform for distributed presentations, while the SemWeb is a decentralized platform for distributed knowledge. Resource Description Framework (RDF) is the W3C standard for encoding knowledge.
There, of course, is knowledge on the current web, but it's off limits to computers. Consider a Wikipedia page, which might convey a lot of information to the human reader, but to the computer displaying the page all it sees is presentation markup. To the extent that computers make sense of HTML, images, Flash, etc., it's almost always for the purpose of creating a presentation for the end user. The real content, the knowledge the files are conveying to the human, is opaque to the computer.
What is meant by "semantic" in Semantic Web is not that computers are going to understand the meaning of anything, but that the logical pieces of meaning can be mechanically manipulated by a machine to useful human ends.
So, now imagine a new web where the real content can be manipulated by computers. For now, picture it as a web of databases. One "semantic" website publishes a database about a product line, with products and descriptions, while another publishes a database of product reviews. A third site for a retailer publishes a database of products in stock. What standards would make it easier to write an application to mesh distributed databases together, so that a computer could use the three data sources together to help an end user make better purchasing decisions?
There's nothing stopping anyone from writing a program now to do those sorts of things, in just the same way that nothing stopped anyone from exchanging data before we had XML. But standards facilitate building applications, especially in a decentralized system. Here are some of the things we would want a standard about distributed knowledge to consider:
1. Files on the Semantic Web need to be able to express information flexibly. Life can't be neatly packed into tables, as in relational databases or hierarchies, as in XML. The information about movies and TV shows contained in the graph below is really best expressed as a graph (see Figure 1):
Figure 1. Knowledge as a graph
Of course, we can't be drawing our way through the Semantic Web, so instead we will need a tabular notation for these graphs that looks a bit like this:
|Start Node||Edge Label||End Node|
Each row of the table specifies an edge from one node in the graph to another. More on this later.
2. Files on the Semantic Web need to be able to relate to each other. A file about product prices posted by a vendor and a file with product reviews posted independently by a consumer need to have a way of indicating that they are talking about the same products. Just using product names isn't enough. Two products might exist in the world both called "The Super Duper 3000," and we want to eliminate ambiguity from the SemWeb so that computers can process the information with certainty. The SemWeb needs globally unique identifiers that can be assigned in a decentralized way.
3. We will use vocabularies for making assertions about things, but these vocabularies must be able to be mixed together. A vocabulary about TV shows developed by TV aficionados and a vocabulary about movies independently developed by movie connoisseurs must be able to be used together in the same file, to talk about the same things (e.g., to assert that an actor has appeared in both TV shows and movies).
These are the requirements that RDF provides a standard for, as we'll see in the next section. Before getting too abstract, here are actual RDF examples of the information from the graph above, first in the Notation 3 format, which closely follows the tabular encoding of the underlying graph:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix ex: <http://www.example.org/> . ex:vincent_donofrio ex:starred_in ex:law_and_order_ci . ex:law_and_order_ci rdf:type ex:tv_show . ex:the_thirteenth_floor ex:similar_plot_as ex:the_matrix .
And in the standard RDF/XML format, which may have a more intuitive feel but tends to obscure the underlying graph:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://www.example.org/"> <rdf:Description rdf:about="http://www.example.org/vincent_donofrio"> <ex:starred_in> <ex:tv_show rdf:about="http://www.example.org/law_and_order_ci" /> </ex:starred_in> </rdf:Description> <rdf:Description rdf:about="http://www.example.org/the_thirteenth_floor"> <ex:similar_plot_as rdf:resource="http://www.example.org/the_matrix" /> </rdf:Description> </rdf:RDF>
RDF was originally created in 1999 as a standard on top of XML for encoding metadata--literally, data about data. Metadata is, of course, things like who authored a web page, what date a blog entry was published, etc., information that is in some sense secondary to some other content already on the regular web. Since then, and perhaps especially after the updated RDF spec in 2004, the scope of RDF has really evolved into something greater. The most exciting uses of RDF aren't in encoding information about web resources, but information about and relations between things in the real world: people, places, concepts, etc.