XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Using the Jena API to Process RDF

May 23, 2001

There has been growing interest in the Resource Description Framework (RDF) and a number of tools and libraries have been developed for processing it. This article describes one such library, Jena, a Java API for processing RDF. It is also the name of an open source implementation of the API.

What is RDF?

XML is very flexible and allows information to be encoded in many different ways. If meaningful tag names are used it is relatively easy for a person to determine the intended interpretation of an XML string. However, it is difficult for programs to determine the intended interpretation since programs don't understand English tag names. DTDs and XML Schemas don't really help in this regard. They just allow a program to verify that XML strings conform to some set of rules.

RDF (RDFMS, Bray, Ogbuji, SWARDF) is a model and XML syntax for representing information in a way that allows programs to understand the intended meaning. It's built on the concept of a statement, a triple of the form {predicate, subject, object}. The interpretation of a triple is that <subject> has a property <predicate> whose value is <object>. Examples of statements are {numberOfHits, http://www.foo.com/index.html, 3000} and {title, http://bookstore.com/book12, "The Connoisseur's Guide to the Mind"}. In RDF a <subject> is always a resource named by a URI with an optional anchor id. The <predicate> is a property of the resource, and the <object> is the value of the property for the resource.

Consider the following triples (where the dc prefix is for the Dublin Core).

      {dc:Publisher, http://www.w3.org, "World Wide Web Consortium"}
      {dc:Title,     http://www.w3.org, "W3C Home Page"}

These triples can be represented graphically as follows.

RDF Model Graph

In this graph the arcs are labeled with predicates. Each arc originates at a node representing a subject and terminates at a node representing an object. The triples and the graph are two different representations of the same RDF data model.

There is also an XML representation of the model. RDF requires that different kinds of semantic information (e.g., subjects, properties, and values) be placed in prescribed locations in XML. Programs that read an XML encoding of RDF can then tell whether a particular element or attribute refers to a subject, a property, or the value of a property.

The Jena API

Jena was developed by Brian McBride of Hewlett-Packard and is derived from earlier work on the SiRPAC API. Jena allows one to parse, create, and search RDF models.

Jena defines a number of interfaces for accessing and manipulating RDF statements as shown in the figure below.

Jena Interfaces UML diagram

The RDFNode interface provides a common base for all elements that can be parts of RDF triples. The Literal interface represents literals such as "red fish" or 225 that can be used as the <object> in {predicate, subject, object} triples. The Literal interface provides accessor methods to convert literals to various Java types such as String, int, and double.

Objects implementing the Property interface can be the <predicate> in {predicate, subject, object} triples.

The Statement interface represents a {predicate, subject, object} triple. It can also be used as the <object> in a triple since RDF allows statements to be nested.

Objects implementing the Container, Alt, Bag, or Seq interface can be the <object> in {predicate, subject, object} triples.

Parsing RDF With Jena

Related Articles

Building a Semantic Web Site

What is RDF?

An Introduction to Dublin Core

One area where RDF can be useful is for embedding metadata in web pages. Such metadata might contain information about the author and subject of the page. This RDF can be encoded as XML embedded within the XHTML page.

An RDF-aware search engine can use this metadata to give more relevant results than a search engine that relies on keyword matching. An RDF-aware search engine crawler can use Jena to parse the RDF. Jena can take an XHTML page that contains embedded RDF and extract and parse the RDF. This is done with the read() method in the Model interface as shown in the following code snippet (exception handling has been omitted for clarity).

File f;
FileReader fr;
Model model;

f = new File("C:\\test1.html");
fr = new FileReader(f);
model = new ModelMem();
model.read(fr, RDFS.getURI());

In this example C:\test1.html is an XHTML file that has RDF in the <head>. Jena automatically extracts the RDF and ignores the rest of the XHTML. The result of parsing is an RDF model containing the triples from the file. This model can then be queried.

The first two statements after the declarations in the code fragment above set fr to a FileReader associated with C:\test1.html. Then model is set to an instance of the ModelMem class. ModelMem is a class provided with Jena that implements the Model interface using main memory as the storage for the model. Other implementations are possible; for example, one could create an implementation based on a transactional database.

Getting All Statements from a Model

Once a search engine crawler has created an RDF model containing the metadata for a web page it needs to add each triple in the model to its index so that later searches can find the pages. This can be done with the listStatements() method in the Model interface. listStatements() returns a StmtIterator that iterates over each statement in the model. It can be used as follows.

Model model;
StmtIterator iter;
Statement stmt;
 .
 .
 .
iter = model.listStatements();
while (iter.hasNext())
    {
    stmt = iter.next();
    // Now use <stmt>
    }

The Statement interface provides methods to access the predicate, subject, and object of the statement as shown below.

Property predicate;
Resource subject;
RDFNode obj;
Statement stmt;
 .
 .
 .
subject = stmt.getSubject();
System.out.println("Subject = " + subject.getURI());
predicate = stmt.getPredicate();
System.out.println("Predicate = " +predicate.getLocalName());
obj = stmt.getObject();
System.out.println("Object = " + obj.toString());

Adding Statements to a Model

Not all applications will read RDF from XML or XHTML files. Many will need to create RDF statements based on user input or other data. Consider an RDF personal information manager which maintains a searchable archive of email messages, browser bookmarks, and calendar entries. When the program receives a new email message it can extract the sender and title and create RDF triples for them. It can also allow the user to enter information about the topics discussed in the message and create RDF triples containing the topic information.

The following code illustrates how Jena can be used to create triples in a model.

Model model;
String namespace = "http://www.test.com";
 .
 .
 .
model.createResource("http://www.foo.com/boats#sailboat")
          .addProperty(model.createProperty(namespace, "length"), 25)
          .addProperty(model.createProperty(namespace, "color"), "teal"); 

This adds the following statements to model:

{x:length, http://www.foo.com/boats#sailboat, 25}
{x:color, http://www.foo.com/boats#sailboat, "teal"}

where x is a namespace prefix corresponding to the namespace URI http://www.test.com.

Querying Models

Once an RDF model has been created we need a way to query it. For example, consider a travel FAQ that contains RDF metadata. Suppose we want to find all questions that relate to traveling to Africa. In other words we wish to find all values for res for which there is a triple of the form {destination, res, Africa} in the model. The following code shows how this can be done. It arbitrarily assumes that the property destination is in a namespace http://foo.org/.

Model model;
Resource r;
ResIterator resourceIter;
 .
 .
 .
resourceIter = model.listSubjectsWithProperty(
                     model.createProperty("http://foo.org/destination"),
                                          "Africa");
while (resourceIter.hasNext())
    {
    r = resourceIter.next();
    System.out.println("Resource " + r.toString() +
                       " is about travel to Africa");
    }

listSubjectsWithProperty(p, v) finds all triples of the form {p, <subject>, v} for any subject <subject>. It returns an object that iterates over the matching triples.

Where to Get Jena

Jena can be downloaded from http://www.hpl.hp.co.uk/people/bwm/rdf/jena/download.htm. The download includes several examples, JavaDoc, source code and jar files.

Acknowledgement

Thanks to Brian McBride for his helpful comments on a draft of this article and for creating Jena.