An Introduction to Prolog and RDF

April 25, 2001

Introduction: SW is AI

Many Semantic Web advocates have gone out of their way to disassociate their visions and projects from the Artificial Intelligence moniker. No surprise, since the AI label has been the kiss of, if not death, at least scorn, since Lisp machines were frozen out of the marketplace during the great "AI winter" of the mid-1980s. Lisp still suffers from its association with the AI label, though it does well by being connected with the actual technologies.

However, it is a curious phenomenon that the AI label tends to get dropped once the problem AI researchers were studying becomes tractable to some degree and yields practical systems. Voice recognition and text-to-speech, expert systems, machine vision, text summarizers, and theorem provers are just a few examples of classic AI tech that has become part of the standard bag of tricks. The AI label tends to mark things which aren't yet implemented in a generally useful manner, often because hardware or general practices haven't yet caught up.

That seems to describe the Semantic Web pretty well.

An aside -- one interesting phenomenon is that a lot of AI ends up, after fleeing the CS department, in Information and Library Sciences. And, of course, librarians, even the non-techie ones, are really into cataloging, searching, sharing, correlating, using metadata, intelligent agents... to wit, all the elements of the Semantic Web. AI folks don't end up in library departments because librarians are pushovers (as my overdue fines attest), but because there's a pretty good fit between what (some) AI-ers like to do, what the library folks want, and between what the librarians want and what the Semantic Web requires.

So the Semantic Web is an AI project, and we should be proud of that fact. Not only is it more honest, but it means that we can be clearer about what constitutes prior art, relevant research and literature, similar projects, and available technology. As I've written before, narrowness of understanding is a pernicious barrier to sensible progress. Reinventing the wheel isn't nearly as bad as having to continually reconceptualize it: "not thought here" generally causes more systematic problems than "not invented here".

In these articles, I'm going to do a little down-to-earth exploration of RDF, a core Semantic Web technology, using a classic AI programming language, Prolog, plus some standard AI techniques and technologies.

A Gentle Prolog Primer

Prolog was the first logic programming language, and it's still popular in industry and in the classroom. There are many implementations, most of rather good quality. Interestingly, Prolog implementations are often used as logic servers or drop-in inference engines for larger programs, so the implementations have gotten fairly good at integrating with other programs (for example, there are several Prolog-style inference engines for the JVM, and some truly fine ones built on Common Lisp).

Prolog is an excellent prototyping language. It's quite easy to pull together programs with interesting and sometimes surprising properties. There is a large, high-quality corpus of Prolog literature and code, much of which is easily adaptable to one's ad hoc needs. For example, a simple backward-chaining expert system is perhaps a page or two of sample code in just about any Prolog textbook. While not production quality, such toys are ideal for getting a concrete sense of the problems and possibilities of an idea.

Syntax and Simple Semantics

There's not room in this article to give a reasonable Prolog tutorial, but a few preliminaries will be useful for getting a grip on RDF and how Prolog can deal with it.

It's helpful to contrast Prolog programs with invocations of them. A typical Prolog program will form a knowledge base -- a database of facts and rules which is used as a basis for inferences. To initiate computations, you query the knowledge base. Here's a very simple Prolog program which forms a small knowledge base about the readers of some popular web sites.

     reads(john, 'XML.com').

     reads(mary, 'XML.com').

     reads(mary, xmlhack).

     reads(cristina, xmlhack).

Each line in this program asserts a fact. The first line claims that john reads 'XML.com'; the second that mary reads 'XML.com', and so on. reads, john, mary, 'XML.com', cristina, and xmlhack are all Prolog atoms (a.k.a. "symbols"). The atom is the most basic and prevalent datatype in Prolog. If an atom begins with an uppercase letter, or contains certain special characters (like the full stop, which is also the statement terminator), then one encloses it in single quotes (hence, 'XML.com'; while standard, you may find Prolog systems with alternative syntax for atom literals).

Given the types of characters that tend to show up in URIs, they almost always need to be enclosed in single quotes to produce their eponymous atoms. RDF makes heavy use of URIs, which basically means that, worst case, when processing RDF with Prolog you'll be writing 'http://purl.org/yadda/yadda/yadda/' a lot (for some reasonable value of "yadda").

Now that we have our knowledge base, we can interrogate our Prolog system. After loading the program into my Prolog ("consulting" it, in Prolog lingo), I can enter questions and receive answers at the "query" prompt.

?-reads(john, 'XML.com').

  yes

"John reads 'XML.com'?"

Prolog says, "Sure does."

?-reads(mary, X). 



   X = 'XML.com'



   yes

"mary reads what?" X is a variable. Prolog searched the knowledge base and found that if X was bound to 'XML.com' we get a "true" statement (i.e., one in the knowledge base).

?-reads(Person, 'XML.com').

  Person = john;

  Person = mary;

  No

"What Person reads 'XML.com'?"

"john does!" (read "Who else?" for ";")

"And mary!"

"Anyone else?"

"Nope."

(Thus we see one standard Prolog development cycle: edit the knowledge base in a text editor. Load it into the system, i.e. "consult it". Then interact with it from the read-query, evaluate, print loop.)
(Note: an unquoted capitalized atom is a variable. Hence X is a variable, as is Person.)

Notice that in the second and third examples, there's more than one answer that will satisfy the query: mary reads both 'XML.com' and xmlhack, and both john and mary read 'XML.com'. In the last session, after Prolog told me that john read 'XML.com', instead of hitting "enter", I hit the semicolon, which told Prolog to look again for other ways my query can be satisfied. I kept doing this until there were no solutions that hadn't already been given. (While these particular commands are quite common in Prolog read-query-print loops, they are not universal.)

Suppose we want to know if any one person reads both 'XML.com' and xmlhack?

   ?-reads(Person, 'XML.com'), reads(Person, xmlhack).

     Person = mary;

     No

(The comma between the clauses is pronounced "and".)

Suppose we want to derive some targeted email marketing lists. We will probably find, in those circumstances, that this last query is quite a common one . It would be quite a drag to have to type that query out every time we wanted to send some spam. More importantly, the concept "a reader of both 'XML.com' and xmlhack" has a special status for us: it defines the term spam_target. We could add the statement spam_target(mary) to our knowledge base, but that's both redundant (as we can figure out that mary's a spam_target from what we already know) and a pain to maintain (e.g., if mary stops reading xmlhack due to having to spend all her time deleting our spam, we have to change two lines in the program which aren't obviously connected). Fortunately, we can add a rule to our knowledge base to define our new concept.


spam_target(Sucker) :- reads(Sucker, 'XML.com'), reads(Sucker, xmlhack).	"A Sucker is a spam_target if That Sucker reads 'XML.com' and That Sucker reads xmlhack."

Assuming that we don't alter our knowledge base any other way, the query spam_target(Person) will return mary.

Moving to RDF

In the pre-rule knowledge base, each fact had three parts:

the predicate, reads;
the subject of the predicate, i.e., the reader (john, mary, and so forth);
and the thing they read, i.e., the object of the predicate ('XML.com' and xmlhack).

By a striking and carefully planned coincidence, these are exactly the components of an RDF triple (hereafter, I'll use "RDF triple" and "triple" interchangeably). The RDF triple is one of several formal models offered by the core RDF spec, and it consists of an ordered 3-tuple of URIs (with the exception that the object position may take a string literal) with the first URI naming a predicate, the second naming a subject, and the last item being either an URI naming an object or a string literal. While the example Prolog facts have the same slots as a triple, the symbols which fill those slots aren't URIs. Happily, it's not that hard to convert our simple knowledge base:

The Objects: since all the objects currently in our knowledge base are web sites, it seems natural to use their base URL as their name, thus, 'http://www.xml.com/' and 'http://www.xmlhack.com/' (remember, to make URIs into standard Prolog atoms, you typically need to single quote them).
The Predicate: the predicate atom (reads) has no intrinsic, natural URI, but we can simply use the URL of this article (which is unique and not particularly useful for anything else) prepended to the atom, which yields: 'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'.
The Subjects: again, there's no natural intrinsic URIs, but it seems a little nasty to use that same long URL prefix that we used for the predicate. To add a little visual difference, we'll invent mailto: based URIs for each person: 'mailto:mary@prologarticle.xml.com', 'mailto:john@prologarticle.xml.com', etc.

We can now covert the example knowledge base to a collection of RDF triples:

Predicate	Subject	Object
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'	'mailto:john@prologarticle.xml.com'	'http://www.xml.com/'
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'	'mailto:mary@prologarticle.xml.com'	'http://www.xml.com/'
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'	'mailto:mary@prologarticle.xml.com'	'http://www.xmlhack.com/'
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'	'mailto:cristina@prologarticle.xml.com'	'http://www.xmlhack.com'

Of course, this table presentation of the triples is bit hard to query. It would be nice if we could encode these triples in a form that Prolog understood. Fortunately, those URI atoms are just atoms, and we can use them just as we did our original (more concise) ones:

   'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'(

             'mailto:john@prologarticle.xml.com',

             'http://www.xml.com/').

   'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'(

             'mailto:mary@prologarticle.xml.com',

             'http://www.xml.com/').

   'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'(

             'mailto:mary@prologarticle.xml.com',

             'http://www.xmlhack.com/').

   'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'(

             'mailto:cristina@prologarticle.xml.com',

             'http://www.xmlhack.com').



   ?-'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'(

             Person,'http://www.xml.com/').

     Person = 'mailto:john@prologarticle.xml.com'

     Yes

This is rather ugly as it stands (adding XML style namespaces will help), but it gives us a nice, constructive demonstration of how RDF triples are, or can be seen as, Prolog facts; and, hence, how a collection of RDF triples (say, as serialized in an RSS document) can be a Prolog program.

However, since Prolog knowledge bases can have facts with many arguments, and can have rules, we might want to keep our RDF-based facts somewhat distinct from the rest of program. One way we might do this is by explicitly saying that a triple of URIs are in the RDF subject-predicate-object relation. We could call that predicate rdf_triple, as in

   rdf_triple('http://www.xml.com/pub/a/2001/04/25/prologrdf/reads',

	      'mailto:john@prologarticle.xml.com',

              'http://www.xml.com/').

   rdf_triple('http://www.xml.com/pub/a/2001/04/25/prologrdf/reads',

              'mailto:mary@prologarticle.xml.com',

              'http://www.xml.com/').

   rdf_triple('http://www.xml.com/pub/a/2001/04/25/prologrdf/reads',

              'mailto:mary@prologarticle.xml.com',

              'http://www.xmlhack.com/').

   rdf_triple('http://www.xml.com/pub/a/2001/04/25/prologrdf/reads,

              'mailto:cristina@prologarticle.xml.com',

              'http://www.xmlhack.com').

We can recover our old, easier to type, formulation by defining a few rules:

   reads(Person, Website) :-

          rdf_triple('http://www.xml.com/pub/a/2001/04/25/prologrdf/reads',

              Person, 

              Website).

Our spam_target rule will work with this new knowledge base essentially as it did with the old one, without modification.

The definition of the rdf_triple predicate establishes a RDF knowledge base. Our reads rule can be thought of as an RDF application. In other words, our rules process the RDF. The kind of processing we do is a form of inference. We can use inferences to produce results similar to other forms of processing (such as transformations or SQL queries) though often with less work and more clarity.

Taking Stock

The root RDF data model is deliberately very minimal and, as with XML, that minimalism is intended to make things easier for programs. One consequence of that minimalism, when coupled with other machine-friendly design tropes, is that though "human readable", RDF is not generally very human writable (although the Notation3 syntax tries to improve things.) Furthermore, while RDF's data model is specified, the processing model isn't (deliberately), so one should expect a wide variety of processors, each working in its own way, depending on a variety of constraints and desiderata.

Standard Prolog provides a rich processing model which naturally subsumes RDF data. As we saw above, deriving RDF triples from Prolog predicates, and then the reverse, can deepen our understanding of both. Furthermore, there is a lot of experience implementing a variety of alternative processing models (both forward and backward chaining systems, for example) in Prolog -- from the experimental toy, through the serious research project, to the industrially deployed, large-scale production system level. Furthermore, Prolog's roots in symbolic processing and language manipulation support a wide array of mechanisms for building expressive notations and languages for knowledge management, which serve well for hiding the less friendly aspects of RDF.

Some Useful Links

Here are a few more online Prolog tutorials:

Adventure In Prolog
Building Expert Systems in Prolog (read Adventure In Prolog first)
Prolog Programming A First Course (an excellent starter)
Quick Prolog (even better for a fast overview)
Logic, Programming and Prolog (2ed) (the whole text in PDF)

And a few links to information about RDF and the Semantic Web:

The W3C's Semantic Web Activity
The RDF Model and Syntax Specification
The RDF Interest Group and RDFIG IRC Scratchpad