XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

What Is RDF
by Tim Bray | Pages: 1, 2, 3

Divine Metadata for the Web

Table of Contents

The Right Way to Find Things
It's All Different Behind the Scenes
Not Just For Searching
What About the Web?
Divine Metadata for the Web
Introducing RDF
Why Not Just Use XML?
The Devil is in the Details
Vocabularies
What RDF Might Mean
Getting started with RDF
Developer Community

People who have thought about these problems, including many librarians and webmasters, generally agree that the Web urgently needs metadata. What would it look like? If the Web had an all-powerful Grand Organizing Directorate (at www.GOD.org), it would think up a set of lookup fields such as Author, Title, Date, Subject, and so on. The Directorate, being, after all, GOD, would simply decree that all Web pages start using this divine Metadata, and that would be that. Of course there would be some details such as how the Web sites ought to package up and interchange the metadata, and we all know that the Devil is in the details, but GOD can lick the Devil any day.

In fact, there is no www.GOD.org. For this reason, there is no chance that everyone will agree to start using the same metadata facilities. If libraries, which have existed for hundreds of years, can't agree on a single standard, there's not much chance that the Web will.

Does this mean that there is no chance for metadata? That everyone is going to have to build their own lookup keys and values and software, and that we're going to be stuck using dumb, brute force robots forever?

No. As we observed with our three search scenarios, metadata operations have an awful lot in common, even when the metadata is different. RDF is an effort to identify these common threads and provide a way for Web architects to use them to provide useful Web metadata without divine intervention.

Introducing RDF

Resource Description Framework, as its name implies, is a framework for describing and interchanging metadata. It is built on the following rules.

  1. A Resource is anything that can have a URI; this includes all the Web's pages, as well as individual elements of an XML document. An example of a resource is a draft of the document you are now reading and its URL is http://www.textuality.com/RDF/Why.html
  2. A Property is a Resource that has a name and can be used as a property, for example Author or Title. In many cases, all we really care about is the name; but a Property needs to be a resource so that it can have its own properties.
  3. A Statement consists of the combination of a Resource, a Property, and a value. These parts are known as the 'subject', 'predicate' and 'object' of a Statement. An example Statement is "The Author of http://www.textuality.com/RDF/Why.html is Tim Bray." The value can just be a string, for example "Tim Bray" in the previous example, or it can be another resource, for example "The Home-Page of http://www.textuality.com/RDF/Why.html is http://www.textuality.com."
  4. There is a straightforward method for expressing these abstract Properties in XML, for example:
<rdf:Description about='http://www.textuality.com/RDF/Why-RDF.html'>
<Author>Tim Bray</Author> 
<Home-Page rdf:resource='http://www.textuality.com' />
</rdf:Description>

RDF is carefully designed to have the following characteristics.

Independence
Since a Property is a resource, any independent organization (or even person) can invent them. I can invent one called Author, and you can invent one called Director (which would only apply to resources that are associated with movies), and someone else can invent one called Restaurant-Category. This is necessary since we don't have a GOD to take care of it for us.
Interchange
Since RDF Statements can be converted into XML, they are easy for us to interchange. This would probably be necessary even if we did have a GOD.
Scalability
RDF statements are simple, three-part records (Resource, Property, value), so they are easy to handle and look things up by, even in large numbers. The Web is already big and getting bigger, and we are probably going to have (literally) billions of these floating around (millions even for a big Intranet). Scalability is important.
Properties are Resources
Properties can have their own properties and can be found and manipulated like any other Resource. This is important because there are going to be lots of them; too many to look at one by one. For example, I might want to know if anyone out there has defined a Property that describes the genre of a movie, with values like Comedy, Horror, Romance, and Thriller. I'll need metadata to help with that.
Values Can Be Resources
For example, most web pages will have a property named Home-Page which points at the home page of their site. So the values of properties, which obviously have to include things like title and author's name, also have to include Resources.
Statements Can Be Resources
Statements can also have properties. Since there's no GOD to provide useful assertions for all the resources, and since the Web is way too big for us to provide our own, we're going to need to do lookups based on other people's metadata (as we do today with Yahoo!). This means that we'll want, given any Statement such as "The Subject of this Page is Donkeys", to be able to ask "Who said so? And When?" One useful way to do this would be with metadata; so Statements will need to have Properties.

Why Not Just Use XML?

XML allows you to invent tags, which may contain both text data and other tags. XML has a built-in distinction between element types, for example the IMG element type in HTML, and elements, for example an individual <img src='Madonna.jpg'>; this corresponds naturally to the distinction between Properties and Statements. So it seems as though XML documents should be a natural vehicle for exchanging general purpose metadata.

XML, however, falls apart on the Scalability design goal. There are two problems:

  1. The order in which elements appear in an XML document is significant and often very meaningful. This seems highly unnatural in the metadata world. Who cares whether a movie's Director or Title is listed first, as long as both are available for lookups? Furthermore, maintaining the correct order of millions of data items is expensive and difficult, in practice.
  2. XML allows constructions like
<Description>The value of this property contains some
text, mixed up with child properties such as its temperature
(<Temp>48</Temp>) and longitude 
(<Longt>101</Longt>). [&Disclaimer;]</Description>
When you represent general XML documents in computer memory, you get weird data structures that mix trees, graphs, and character strings. In general, these are hard to handle in even moderate amounts, let alone by the billion.

On the other hand, something like XML is an absolutely necessary part of the solution to RDF's Interchange design goal. XML is unequalled as an exchange format on the Web. But by itself, it doesn't provide what you need in a metadata framework.

Pages: 1, 2, 3

Next Pagearrow