What Is RDF
January 24, 2001
Spanish Translation available here.
This article was first published as "RDF and Metadata" on XML.com in June 1998. It has been updated by ILRT's Dan Brickley, chair of the W3C's RDF Interest Group, to reflect the growing use of RDF and updates to the specification since 1998.
RDF stands for Resource Description Framework. RDF is built for the Web, but let's leave the Web behind for now and think about how we find things in the real world.
Scenario 1: The Library
You're in a library to find books on raising donkeys as pets. In most libraries these days you'd use the computer lookup system, basically an electronic version of the old card file. This system allows you to list books by author, title, subject, and so on. The list includes the date, author, title, and lots of other useful information, including (most important of all) where each book is.
Scenario 2: The Video Store
You're in a video store and you want a movie by John Huston. A large modern video store offers a lookup facility that's similar to the library's. Of course, the search properties are different (director, actors, and so on) but the results are more or less the same.
Scenario 3: The Phone Book
You're working late at a customer's office in South Denver, and it seems that a pizza is essential if work is to continue. Fortunately, every office comes equipped with a set of Yellow Pages that, when properly used, can lead to quick pizza delivery.
The Common Thread
What do all these situations have in common, and what differences lie behind the scenes? First of all, each of these systems is based on metadata, that is, information about information. In each case, you need a piece of information (the book's location, the video's name, the pizza joint's phone number) you don't have. In each case, you use metadata (information about information) to get it.
We're all used to this stuff; metadata ordinarily comes in named chunks (subject, director, business category) that associate lookup information ("donkeys", "John Huston", "Pizza, South Side") with the information you're really after.
Here's a subtle but important point -- in theory, metadata is not really necessary: you could go through the library one book at a time looking for donkey books, or through the video store shelves until you found your movie, or call all the numbers in your area code until you find pizza delivery. But that would be very wasteful, in fact, it would be stupid. Metadata is the way to go.
In each of our scenarios, we used metadata, and we used it in remarkably similar ways. Does this mean that the library, the video store, and the phone company all use the same metadata setup? Of course not. Every library has a choice among at least two systems for organizing their books, and among many vendors who will sell them software to do the looking-up. The same is obviously true for video stores and phone companies.
In fact most such products define their own system of metadata and their own facilities for storing and managing it. They typically do not offer facilities for sharing or interchanging it. This doesn't cause too much of a problem, assuming they do a decent job with the user interface. We are comfortable enough with the general process we call "looking things up" (really, searching via metadata) that we are able to adapt and use all these different systems.
The most common daily use of metadata is to aid our discovery of things. But there are lots of other uses going on behind the scenes. The library and video store are storing other metadata that you don't see: how often the books and videos are being used, how much it cost to buy them, where to go for a replacement, etc. Running a library or a video store would be unthinkable without metadata. Similarly, the phone company, of course, uses its metadata, most obviously to print the Yellow Pages, but for many other internal management and administration tasks.
The Web is a lot like a really really big library. There are millions of things out there, and if you know the URL (in effect a kind of call number) you can get them. Since the Web has books, movies, and pizza joints, the number of ways you might want to look things up includes all the things a library uses, plus all the things the video store uses, plus all the things the Yellow Pages use, and lots more.
The problem at the moment is that there is hardly any metadata on the Web. So how do we find things? Mostly by using dumb, brute force techniques. The dumb, brute force is supplied by the wandering web robots of search engine sites like Altavista, Infoseek, and Excite. These sites do the equivalent of going through the library, reading every book, and allowing us to look things up based on the words in the text. It's not surprising that people complain about search results, or that the robots are always way behind the growth and change of the Web.
In fact there is one metadata-based general purpose lookup facility: Yahoo! Yahoo doesn't use a robot. When you search through Yahoo, you're searching through human-generated subject categories and site labels. Compared to the amount of metadata that a library maintains for its books, Yahoo! is pitiful; but its popularity is clear evidence of the power of (even limited) metadata.
People who have thought about these problems, including many librarians and webmasters, generally agree that the Web urgently needs metadata. What would it look like? If the Web had an all-powerful Grand Organizing Directorate (at www.GOD.org), it would think up a set of lookup fields such as Author, Title, Date, Subject, and so on. The Directorate, being, after all, GOD, would simply decree that all Web pages start using this divine Metadata, and that would be that. Of course there would be some details such as how the Web sites ought to package up and interchange the metadata, and we all know that the Devil is in the details, but GOD can lick the Devil any day.
In fact, there is no www.GOD.org. For this reason, there is no chance that everyone will agree to start using the same metadata facilities. If libraries, which have existed for hundreds of years, can't agree on a single standard, there's not much chance that the Web will.
Does this mean that there is no chance for metadata? That everyone is going to have to build their own lookup keys and values and software, and that we're going to be stuck using dumb, brute force robots forever?
No. As we observed with our three search scenarios, metadata operations have an awful lot in common, even when the metadata is different. RDF is an effort to identify these common threads and provide a way for Web architects to use them to provide useful Web metadata without divine intervention.
Resource Description Framework, as its name implies, is a framework for describing and interchanging metadata. It is built on the following rules.
- A Resource is anything that can have a URI; this includes all the Web's pages, as well as individual elements of an XML document. An example of a resource is a draft of the document you are now reading and its URL is http://www.textuality.com/RDF/Why.html
- A Property is a Resource that has a name and can be used as a property, for
Title. In many cases, all we really care about is the name; but a Property needs to be a resource so that it can have its own properties.
- A Statement consists of the combination of a Resource, a Property, and a value. These parts are known as the 'subject', 'predicate' and 'object' of a Statement. An example Statement is "The Author of http://www.textuality.com/RDF/Why.html is Tim Bray." The value can just be a string, for example "Tim Bray" in the previous example, or it can be another resource, for example "The Home-Page of http://www.textuality.com/RDF/Why.html is http://www.textuality.com."
- There is a straightforward method for expressing these abstract Properties in XML, for example:
<rdf:Description about='http://www.textuality.com/RDF/Why-RDF.html'> <Author>Tim Bray</Author> <Home-Page rdf:resource='http://www.textuality.com' /> </rdf:Description>
RDF is carefully designed to have the following characteristics.
- Since a Property is a resource, any independent organization (or even person) can invent them. I can invent one called Author, and you can invent one called Director (which would only apply to resources that are associated with movies), and someone else can invent one called Restaurant-Category. This is necessary since we don't have a GOD to take care of it for us.
- Since RDF Statements can be converted into XML, they are easy for us to interchange. This would probably be necessary even if we did have a GOD.
- RDF statements are simple, three-part records (Resource, Property, value), so they are easy to handle and look things up by, even in large numbers. The Web is already big and getting bigger, and we are probably going to have (literally) billions of these floating around (millions even for a big Intranet). Scalability is important.
- Properties are Resources
- Properties can have their own properties and can be found and manipulated like any other Resource. This is important because there are going to be lots of them; too many to look at one by one. For example, I might want to know if anyone out there has defined a Property that describes the genre of a movie, with values like Comedy, Horror, Romance, and Thriller. I'll need metadata to help with that.
- Values Can Be Resources
- For example, most web pages will have a property named Home-Page which points at the home page of their site. So the values of properties, which obviously have to include things like title and author's name, also have to include Resources.
- Statements Can Be Resources
- Statements can also have properties. Since there's no GOD to provide useful assertions for all the resources, and since the Web is way too big for us to provide our own, we're going to need to do lookups based on other people's metadata (as we do today with Yahoo!). This means that we'll want, given any Statement such as "The Subject of this Page is Donkeys", to be able to ask "Who said so? And When?" One useful way to do this would be with metadata; so Statements will need to have Properties.
XML allows you to invent tags, which may contain both text data and other tags. XML
built-in distinction between element types, for example the
type in HTML, and elements, for example an individual
src='Madonna.jpg'>; this corresponds naturally to the distinction between
Properties and Statements. So it seems as though XML documents should be a natural
for exchanging general purpose metadata.
XML, however, falls apart on the Scalability design goal. There are two problems:
- The order in which elements appear in an XML document is significant and often very meaningful. This seems highly unnatural in the metadata world. Who cares whether a movie's Director or Title is listed first, as long as both are available for lookups? Furthermore, maintaining the correct order of millions of data items is expensive and difficult, in practice.
- XML allows constructions like
<Description>The value of this property contains some text, mixed up with child properties such as its temperature (<Temp>48</Temp>) and longitude (<Longt>101</Longt>). [&Disclaimer;]</Description>
On the other hand, something like XML is an absolutely necessary part of the solution to RDF's Interchange design goal. XML is unequalled as an exchange format on the Web. But by itself, it doesn't provide what you need in a metadata framework.
The four general rules given above define the central ideas of RDF. It turns out that it takes quite a lot of abstract terminology and XML syntax to define them precisely enough that people can write computer programs to process them. In particular, turning Statements into Resources is quite tricky. It also turns out that in a (very) few cases, you do need to order your properties, and this requires quite a bit of syntax.
This article doesn't explain all these details; there are a variety of excellent resources to be found at http://www.w3.org/RDF that are designed to do just that.
RDF, as we've seen, provides a model for metadata, and a syntax so that independent parties can exchange it and use it. What it doesn't provide though is any Properties of its own. RDF doesn't define Author or Title or Director or Business-Category. That would be a job for GOD, if there were one. Since there isn't, it's a job for everyone.
It seems unlikely that one Property standing by itself is apt to be very useful. It is expected that these will come in packages; for example, a set of basic bibliographic Properties like Author, Title, Date, and so on. Then a more elaborate set from OCLC and a competing one from the Library of Congress. These packages are called Vocabularies; it's easy to imagine Property vocabularies describing books, videos, pizza joints, fine wines, mutual funds, and many other species of Web wildlife.
The Web is too big for anyone person to stay on top of. In fact, it contains information about a huge number of subjects, and for most of those subjects (such as fine wines, home improvement, and cancer therapy), the Web has too much information for any one person to stay on top of and much of anything else .
This means that opinions, pointers, indexes, and anything that helps people discover things are going to be commodities of very high value. Nobody thinks that everyone will use the same vocabulary (nor should they), but with RDF we can have a marketplace in vocabularies. Anyone can invent them, advertise them, and sell them. The good (or best-marketed) ones will survive and prosper. Probably most niches of information will come to be dominated by a small number of vocabularies, the way that library catalogs are today.
And even among people who are sharing the use of metadata vocabularies, there's no need to share the same software. RDF makes it possible to use multiple pieces of software to process the same metadata, and to use a single piece of software to process (at least in part) many different metadata vocabularies.
With any luck, this should make the Web more like a library, or a video store, or a phone book, than it is today.
Since RDF became a W3C Recommendation in February 1999, a number of tools have been created by developers working with RDF. For an in-depth treatment of these, consult the W3C RDF home page. A number of other listings are available, including XMLhack and Dave Beckett's RDF Resource Guide.
The main email list for RDF developer discussion is W3C's RDF Interest Group. A number of other RDF-related discussion lists exist, including the Mozilla-RDF forum (the Mozilla and Netscape 6 browsers make heavy use of RDF). More recently, the RDF-Logic list has been announced, providing a forum for the discussion of formal, logic-based approaches to knowledge representation for the Web. DARPA's DAML (DARPA Agent Markup Language) initiative uses the RDF-Logic list for discussions and announcements.
The RDF developer community is rather diverse, which is reflected in the nature of online discussions on the RDF lists. While one strand of RDF development is concerned with highly formal topics (RDF-Logic, DAML and so on), others are busy deploying simpler, more pragmatic applications for Web-based content and metadata syndication. All these themes meet (sometimes productively, sometimes confusingly) on the RDF Interest Group list, but they also typically each have a dedicated email list. For example, the RSS-DEV group has produced the RDF Site Summary (RSS) 1.0 Specification, which provides an RDF-based channel format, designed for interoperability with high level vocabularies such as Dublin Core as well as a variety of more application-specific RDF vocabularies.
Notes on Update (Dan Brickley)
This update to the 1998 article serves only to synchronize it with recent RDF terminology. Since this document was first published, the W3C has published the Model and Syntax specification as a Recommendation.
I have updated the markup example to use current RDF 1.0 syntax. There have also been some terminology changes: 'PropertyType' became 'Property', 'Property' became 'Statement'. I have also added a brief mention of subject/predicate/object terminology, and lowercased a few mentions 'Value' (since rdf:object replaced rdf:value for talking about the object of a statement).