XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

BrownSauce: An RDF Browser

February 05, 2003

Introduction

BrownSauce is an RDF browser. It attempts, armed with no more than a knowledge of RDF and RDF Schema, to present all RDF data as intelligibly as possible.

RDF is biased in favor of the data producer. Consumers may have to deal with all, some, or none of the expected properties or classes, and they may have to be aware that entirely unknown properties and classes are possible and legitimate. BrownSauce is an attempt to deal with all that is thrown at it.

RDF Data

Here is an RDF document:

<House rdf:resource="http://example.com/damian_house">
    <address parseType="resource">
        <number>137</number>
        <street>Cranbook Road</street>
        <city>Bristol</city>
    </address>
    <resident>
        <Person rdf:resource="http://example.com/damian">
            <name>Damian Steer</name>
            <mailbox rdf:resource="mailto:damian@example.com"/>
            <rdfs:seeAlso rdf:resource="http://example.com/document_b.rdf"/>
        </Person>
    </resident>
</House>

The graph represented by this document looks a little like this:

A simple RDF graph produced by coarse graining
A simple RDF graph produced by coarse graining

Imagine you had to present that information. What would it look like ideally? My suggestion is that you'd see that there was a house, with address 137 Cranbrook Road, etc. and that the resident at this house was person "Damian Steer" who had an email address damian@example.com. There are two things: a house and a person, and they are related. The goal is to make this clear to the user.

To achieve this presentation, an obvious route to follow is using existing XML styling mechanisms (HyperDAML uses such an approach, for example). However, the XML approach is at the mercy of the form of the RDF serialization. Although my example is quite readable, the same information could be given in a form three times longer with data about the house and the person intermingled.

An RDF-based approach must be better. Here are two approaches:

Showing the graph.
This is a popular approach, indeed I've written such a tool myself. One simply displays an RDF document as a graph. Examples include RDFViz, IsaViz, and RDFAuthor. This works for small documents, but can quickly become confusing for large ones.
Stepping through triples.
Alternatively one might show a node in a graph, plus neighboring nodes. For example, we might show the house, which has a resident, an address, and a type (House). Moving to the resident we see it is a Person, with name "Damian Steer". However this can be a slow process and presents too little information at some points.

Coarse Grained Display of an RDF Graph

BrownSauce attempts to improve on the triples approach. The problem is that such a display is too fine grained, but it has advantages: it will work with large documents or even sources where no single document is available (e.g., databases).

So how to can an application find the obvious patterns in RDF data? RDF, unlike XML, has no mechanisms for expressing data structure; indeed, it is a semi-structured data format, so such information would only be a partial help. Having said that, the reader may be aware of RDF Schema. Don't be fooled by the name: RDF Schemas describe properties and classes, but cannot state that "Houses have addresses". (Closer to this ideal is schemarama.)

When one looks at RDF data it is apparent that there are regular patterns. These are captured in BrownSauce using a simple rule: start at a node and work outward, passing over blank nodes.

In our example this results in a house, with an address, and a resident. But it stops at the resident. If we want information about the resident, we find a person, with a name, who is a resident of a house.

 House (http://example.com/damian_house) address: number: 137
street: Cranbrook Road city: Bristol resident: http://example.com/damian

Person (http://example.com/damian) name: Damian Steer mailbox:
damian@example.com seeAlso: http://example.com/document_b.rdf
Or, graphically, our original graph is divided into two regions relating to the house and person:
Graph divided into house and person regions
Graph divided into house and person regions

I confess my example was rigged, yet using genuine data, gathered from the Web shows some success. The reason for this, I suspect, is that slapping global identifiers on nodes which only appear for structuring purposes is a little pointless. For example, collections are often blank, such as the rdf:Seq nodes in RSS 1.0 feeds. I think it's unlikely that people would want to refer to the collection in a feed rather than the feed itself.

BrownSauce essentially produces a subgraph of the original data, one which contains, ideally, all the information pertinent to the subject. This subgraph has another useful property: the leaf nodes are all identifiable (i.e. not blank) -- linking is robust.

Having said this, the original rule often failed on one particular type of data: FOAF data. Here is an example of just such a failure:

<foaf:Person>
    <foaf:mbox rdf:resource="mailto:a@example.com"/>
    <foaf:knows>
        <foaf:Person>
            <foaf:mbox rdf:resource="mailto:b@example.com"/>
        </foaf:Person>
    </foaf:knows>
</foaf:Person>

The problem is that the coarse graining results in only one person, not two. This coarse graining misses that the FOAF people are, in effect, labeled. foaf:mbox is a daml:UnambiguousProperty, that is a property whose object uniquely identifies the subject. This information is contained in the FOAF schema. (Edd Dumbill has provided an good introduction to FOAF, and I should add such semantics are not part of RDF proper, but FOAF's use is not unique.)

As a consequence BrownSauce traverses until it reaches an identifiable node: that is, either a node labeled with a resource, the subject of an unambiguous property, or a literal. And to do this BrownSauce loads all schemas it encounters (which I believe is fairly unusual in RDF applications).

This also means that the subgraph BrownSauce produces is not quite what might be expected. The house subgraph is actually identical to the graphs above, but the person node is marked as a boundary: i.e. it contains nodes beyond the person node. BrownSauce has to do this to check for other identifiers for the node. It is also useful since the graph contains more information about the boundary nodes, which can help when rendering.

It is the nature of such algorithms to be imperfect. There will always be some niggling cases, cases where the result simply looks wrong. For these occasions BrownSauce provides a little customization: brownsauce:Traversable. Instances of this class are unconditionally traversed. For example, I happen to like my RSS channels complete, but rss:Item nodes are normally labeled. RSS channels then become largely lists of references to items. However, by adding rss:Item rdfs:subClassOf brownsauce:Traversable to the customization file custom.rdf, all items are traversed and the full feed gets displayed.

The Final Product

Although I've concentrated on BrownSauce's ability to handle coarsely grained data, it has some other features worth mentioning.

A screenshot of brownsauce
A screenshot of BrownSauce

In an ideal world we would never see a URI in browsers and that includes BrownSauce. To this end BrownSauce treats rdfs:label (unsurprisingly) as a label for nodes. For example if a model contains a statement http://example.com/blah rdfs:label "Blah", the front end shows "Blah" rather than the URI. But since labels aren't that common outside schema documents, BrownSauce also treats properties as labels if they are subproperties of rdfs:label. This might be from the schema itself or in the custom.rdf file. For example, custom.rdf contains the statement foaf:name rdfs:subPropertyOf rdfs:label; as a result some of the people in the above screenshot have been displayed by name rather than some obscure alpha numeric sequence. Property and class labels are also used (if the schema is available) in preference to the local name.

BrownSauce also keeps track of rdfs:seeAlso links. In the original release these were simply used as links to other RDF sources, but now data can be merged from multiple sources. So if a document contains little information about a thing, but points to more information, this can be added.

Future Work

Currently BrownSauce can only browse documents. From the outset, however, the plan was to extend browsing to other sources such as databases with web interfaces. The code is in place and may well be added in the near future.

A related, desirable feature is typed seeAlsos. Suppose that you know a book has an entry in a database with a Squish/SOAP endpoint. In your bibliography you might add: <some_book_id> rdfs:seeAlso <database_endpoint>, <database_endpoint> rdf:type <squish_soap_class>. BrownSauce could then add data from that source using the appropriate backend.

Final Thoughts

This article presents only half the story, but I think it has given you some idea of how BrownSauce works. BrownSauce is free software, available under the same license as Jena. If you want to change it to fit your needs, you can.

Acknowledgments

I'd like to thank Hewlett Packard Labs, Bristol, which employed me while I wrote BrownSauce and, particularly, the Semantic Web group for its help, as well as for creating Jena, without which my life would have been a great deal harder.

Related Links

RDF Primer

RDF Schema

Jena Semantic Web Toolkit

FOAF



1 to 2 of 2
  1. House RDF file is invalid.
    2003-02-09 03:20:52 Victor Lindesay
  2. Bravo! (+ granularity)
    2003-02-06 09:45:24 Danny Ayers
1 to 2 of 2