RDF and Metadata
This article has now been updated to incorporate changes in the RDF spec and the growth of the RDF community. You can find a newer version here: What is RDF?.
RDF stands for Resource Description Framework. RDF is built for the Web, but let's leave Web-land behind for a few minutes and think about how we find things in the real world.
You're in a library to find books on raising donkeys as pets. In most libraries these days you'd use the computer lookup system, basically an electronic version of the old card file. This system allows you to list books by author, title, call-number, and subject. The list includes the date, author, title, and lots of other useful information, including (most important of all) where each book is.
You're in a video store and you want a movie by John Huston. A large modern video store offers a lookup facility that's similar to the library's. Of course, the things you can search on are different (director, actors, and so on) but the results are more or less the same.
You're working late at a customer's office in South Denver, and it seems that a pizza is essential if work is to continue. Fortunately, every office comes equipped with a set of Yellow Pages that, properly used, can lead to quick pizza delivery.
What do all these situations have in common, and what differences lie behind the scenes? First of all, each of these systems is based on metadata, or information about information. In each case, you need a piece of information (the book's location, the video's name, the pizza joint's phone number). In each case, you use metadata (information about information) to get it.
We're all used to this stuff; the usual setup is that metadata comes in named chunks (subject, director, business category) that associate lookup information ("donkeys", "John Huston", "Pizza, South Side") with the real info that you're after.
Here's a subtle but important point: in theory, metadata is not really necessary. In principle, you could go through the library one book at a time looking for donkey books; or through the video store shelves until you found your movie; or call all the numbers in your area code until you find pizza delivery. But that would be very wasteful -- in fact, downright stupid. Metadata is the way to go.
In each of our scenarios, we used metadata, and used it in a remarkably similar way. Does this mean that the library, the video store, and the phone company all use the same metadata setup? Of course not -- to start with, every library has a choice among at least two systems for organizing their books, and among many vendors who will sell them software to do the looking-up. The same is true, obviously, for video stores and phone companies.
In fact, most such products define their own system of metadata and their own facilities for storing and managing it; they typically do not offer any facilities for sharing or interchanging it. This doesn't cause too much of a problem, assuming they do a decent job with the user interface. We are comfortable enough with the general process we call "looking things up" (really, searching via metadata) that we are able to adapt and use all these different systems.
The most common day-to-day use of metadata is to help us find things. But there are lots of other uses going on behind the scenes: the library and video store are both keeping other metadata that you don't see, concerning how often the books and videos are being used, how much it cost to buy them, where to go for a replacement; running a library or a video store would be unthinkable without metadata. Similarly, the phone company, of course, uses its metadata, most obviously to print the Yellow Pages, but for many other internal management and administration tasks.
The Web is a lot like a really REALLY big library, in that there are millions of things out there, and if you know the URL (in effect an electronic "call number") you can get them. Since the Web has books, movies, and pizza joints, the number of things that you might need to look things up by includes all the things a library uses, plus all the things the video store uses, plus all the things the Yellow Pages use, and lots more.
The problem at the moment is that there is hardly any metadata on the Web. So how do we find things? Mostly, using dumb brute-force techniques. The dumb brute force is supplied by the Web robots of search engine sites like Altavista, Infoseek, and Excite. These sites do the equivalent of going through the library, reading every book, and allowing us to look things up based on the words in the text. It's not surprising that people complain about search results, or that the robots are always way behind the growth and change of the Web.
In fact, there is one metadata-based general purpose lookup facility: Yahoo!, which is the most visited Web site of all. Yahoo doesn't use a robot. When you search through Yahoo, you're searching through human-generated subject categories and site labels. Compared to the amount of metadata that a library maintains for its books, Yahoo! is pitiful; but its popularity is clear evidence of the power of (even limited) metadata.
People who have thought about these problems, and including many of the world's librarians and webmasters, generally agree that the Web urgently needs metadata. What would it look like? If the Web had an all-powerful Grand Organizing Directorate (at www.GOD.org), they would think up a set of lookup fields such as Author, Title, Date, Subject, and so on. The Directorate, being, after all, GOD, would simply decree that all Web pages start using this divine Metadata, and that would be that. Of course there would be some details such as how the Web sites ought to package up and interchange the metadata, and we all know that the Devil is in the details, but GOD can lick the Devil any day.
In fact, there is no www.GOD.org. For this reason, there is no chance that everyone will agree to start using the same metadata facilities. If libraries, which have been existence for thousands of years, can't agree on a single standard, there's not much chance that the Web will.
Does this mean that there is no chance for metadata? That everyone is going to have to build their own lookup keys and values and software, and that we're going to be stuck using dumb brute-force robots forever?
No -- because as we observed with our three search scenarios, metadata operations have an awful lot in common, even when the metadata is different. RDF is an effort to identify these common threads and provide a way for Web architects to use them to provide useful Web metadata without divine intervention.
Resource Description Framework, as its name implies, is a framework for describing and interchanging metadata. It is built on the following rules:
Author or Title.
In many cases, all we really care about is the name; but a PropertyType needs
to be a resource so that it can have its own properties.<RDF:Description href='http://www.textuality.com/RDF/Why-RDF.html'> <Author>Tim Bray</Author> <Home-Page RDF:href='http://www.textuality.com' /> </RDF:Description>
RDF is carefully designed to have the following characteristics:
XML allows you to invent tags, and for the tags to contain both text data
and other tags.
Also, XML has a built-in distinction between element types, for example
the IMG element type in HTML, and elements, for example an
individual <IMG SRC='Madonna.jpg'>; this corresponds naturally
to the distinction between PropertyTypes and Properties.
So it seems as though XML documents should be a natural vehicle for exchanging
general purpose metadata.
XML, however, falls apart on the Scalability design goal. There are two problems:
<Description>The value of this property contains some text, mixed up with child properties such as its temperature (<Temp>48</Temp>) and longitude (<Longt>101</Longt>). [&Disclaimer;]</Description>
On the other hand, something like XML is an absolutely necessary part of the solution to RDF's Interchange design goal. XML is unequalled as an exchange format on the Web; but by itself, it doesn't provide what you need in a metadata framework.
The four general rules given above define the central ideas of RDF. It turns out that it takes quite a lot of abstract terminology and XML syntax to define them precisely enough that people can write computer programs to process them. In particular, turning Properties into Resources is quite tricky. Also, it turns out that in a (very) few cases, you do need to order your properties, and this requires quite a bit of syntax.
This article is not going to try to explain all these details; there are a variety of excellent resources to be found at http://www.w3.org/RDF that are designed to do just that.
RDF, as we've seen, provides a model for metadata, and a syntax so that independent parties can exchange it and use it. What it doesn't provide though, is any PropertyTypes of its own. That is to say, RDF doesn't define Author or Title or Director or Business-Category. That would be a job for www.GOD.org, if there were one. Since there isn't, it's a job for everyone.
It seems unlikely that one PropertyType standing by itself is apt to be very useful. It is expected that these will come in packages; for example, a set of basic bibliographic PropertyTypes like Author, Title, Date, and so on. Then a more elaborate set from OCLC, and a competing one from the Library of Congress. These packages are called Vocabularies; it's easy to imagine PropertyType vocabularies describing books, videos, pizza joints, fine wines, mutual funds, and many other species of Web wildlife.
The Web is too big for anyone person to stay on top of. In fact, it contains information about a huge number of subjects, and for most of those subjects (such as fine wines, home improvement, and cancer therapy), the Web has too much information for any one person to stay on top of and also have a real job.
This means that opinions, pointers, indexes, and anything that helps people "look things up" are going to be commodities of very high value. That is to say, vocabularies. Nobody thinks that everyone will use the same vocabulary (nor should they), but with RDF we can have a marketplace in vocabularies. Anyone can invent them, advertise them, and sell them. The good (or best-marketed) ones will survive and prosper. Probably, most niches of information will come to be dominated by a small number of vocabularies, the way that library catalogues are today.
And even among people who are sharing the use of metadata vocabularies, there's no need to share the same software. RDF makes it possible to use multiple different pieces of software to process the same metadata, and to use a single piece of software to process (at least in part) many different metadata vocabularies.
With any luck, this should make the Web more like a library, or a video store, or a phone book, than it is today.
W3C+Standards template=.stdlist.def::>XML.com Copyright © 1998-2006 O'Reilly Media, Inc.