BrownSauce is an RDF browser. It attempts, armed with no more than a knowledge of RDF and RDF Schema, to present all RDF data as intelligibly as possible.
RDF is biased in favor of the data producer. Consumers may have to deal with all, some, or none of the expected properties or classes, and they may have to be aware that entirely unknown properties and classes are possible and legitimate. BrownSauce is an attempt to deal with all that is thrown at it.
Here is an RDF document:
<House rdf:resource="http://example.com/damian_house"> <address parseType="resource"> <number>137</number> <street>Cranbook Road</street> <city>Bristol</city> </address> <resident> <Person rdf:resource="http://example.com/damian"> <name>Damian Steer</name> <mailbox rdf:resource="mailto:email@example.com"/> <rdfs:seeAlso rdf:resource="http://example.com/document_b.rdf"/> </Person> </resident> </House>
The graph represented by this document looks a little like this:
|A simple RDF graph produced by coarse graining|
Imagine you had to present that information. What would it look like
ideally? My suggestion is that you'd see that there was a house, with
address 137 Cranbrook Road, etc. and that the resident at this house was
person "Damian Steer" who had an email address
firstname.lastname@example.org. There are two things: a house and a
person, and they are related. The goal is to make this clear to the
To achieve this presentation, an obvious route to follow is using existing XML styling mechanisms (HyperDAML uses such an approach, for example). However, the XML approach is at the mercy of the form of the RDF serialization. Although my example is quite readable, the same information could be given in a form three times longer with data about the house and the person intermingled.
An RDF-based approach must be better. Here are two approaches:
- Showing the graph.
- This is a popular approach, indeed I've written such a tool myself. One simply displays an RDF document as a graph. Examples include RDFViz, IsaViz, and RDFAuthor. This works for small documents, but can quickly become confusing for large ones.
- Stepping through triples.
- Alternatively one might show a node in a graph, plus neighboring nodes. For example, we might show the house, which has a resident, an address, and a type (House). Moving to the resident we see it is a Person, with name "Damian Steer". However this can be a slow process and presents too little information at some points.
Coarse Grained Display of an RDF Graph
BrownSauce attempts to improve on the triples approach. The problem is that such a display is too fine grained, but it has advantages: it will work with large documents or even sources where no single document is available (e.g., databases).
So how to can an application find the obvious patterns in RDF data? RDF, unlike XML, has no mechanisms for expressing data structure; indeed, it is a semi-structured data format, so such information would only be a partial help. Having said that, the reader may be aware of RDF Schema. Don't be fooled by the name: RDF Schemas describe properties and classes, but cannot state that "Houses have addresses". (Closer to this ideal is schemarama.)
When one looks at RDF data it is apparent that there are regular patterns. These are captured in BrownSauce using a simple rule: start at a node and work outward, passing over blank nodes.
In our example this results in a house, with an address, and a resident. But it stops at the resident. If we want information about the resident, we find a person, with a name, who is a resident of a house.
Or, graphically, our original graph is divided into two regions relating to the house and person:
House (http://example.com/damian_house) address: number: 137 street: Cranbrook Road city: Bristol resident: http://example.com/damian Person (http://example.com/damian) name: Damian Steer mailbox: email@example.com seeAlso: http://example.com/document_b.rdf
|Graph divided into house and person regions|
I confess my example was rigged, yet using genuine data, gathered from
the Web shows some success. The reason for this, I suspect, is that
slapping global identifiers on nodes which only appear for structuring
purposes is a little pointless. For example, collections are often blank,
such as the
rdf:Seq nodes in RSS 1.0 feeds. I think it's
unlikely that people would want to refer to the collection in a feed
rather than the feed itself.
BrownSauce essentially produces a subgraph of the original data, one which contains, ideally, all the information pertinent to the subject. This subgraph has another useful property: the leaf nodes are all identifiable (i.e. not blank) -- linking is robust.
Having said this, the original rule often failed on one particular type of data: FOAF data. Here is an example of just such a failure:
<foaf:Person> <foaf:mbox rdf:resource="mailto:firstname.lastname@example.org"/> <foaf:knows> <foaf:Person> <foaf:mbox rdf:resource="mailto:email@example.com"/> </foaf:Person> </foaf:knows> </foaf:Person>
The problem is that the coarse graining results in only one person, not
two. This coarse graining misses that the FOAF people are, in effect,
foaf:mbox is a
daml:UnambiguousProperty, that is a property whose object
uniquely identifies the subject. This information is contained in
the FOAF schema. (Edd Dumbill has
provided an good introduction to FOAF, and I should add such semantics
are not part of RDF proper, but FOAF's use is not unique.)
As a consequence BrownSauce traverses until it reaches an identifiable node: that is, either a node labeled with a resource, the subject of an unambiguous property, or a literal. And to do this BrownSauce loads all schemas it encounters (which I believe is fairly unusual in RDF applications).
This also means that the subgraph BrownSauce produces is not quite what might be expected. The house subgraph is actually identical to the graphs above, but the person node is marked as a boundary: i.e. it contains nodes beyond the person node. BrownSauce has to do this to check for other identifiers for the node. It is also useful since the graph contains more information about the boundary nodes, which can help when rendering.
It is the nature of such algorithms to be imperfect. There will always
be some niggling cases, cases where the result simply looks wrong. For
these occasions BrownSauce provides a little customization:
brownsauce:Traversable. Instances of this class are
unconditionally traversed. For example, I happen to like my RSS channels
rss:Item nodes are normally labeled. RSS
channels then become largely lists of references to items. However, by
rss:Item rdfs:subClassOf brownsauce:Traversable to the
custom.rdf, all items are traversed and the
full feed gets displayed.
The Final Product
Although I've concentrated on BrownSauce's ability to handle coarsely grained data, it has some other features worth mentioning.
|A screenshot of BrownSauce|
In an ideal world we would never see a URI in browsers and that
includes BrownSauce. To this end BrownSauce treats
(unsurprisingly) as a label for nodes. For example if a model contains a
http://example.com/blah rdfs:label "Blah", the
front end shows "Blah" rather than the URI. But since labels aren't that
common outside schema documents, BrownSauce also treats properties as
labels if they are subproperties of
rdfs:label. This might be
from the schema itself or in the custom.rdf file. For example,
custom.rdf contains the statement
rdfs:subPropertyOf rdfs:label; as a result some of the people in
the above screenshot have been displayed by name rather than some obscure
alpha numeric sequence. Property and class labels are also used (if the
schema is available) in preference to the local name.
BrownSauce also keeps track of
rdfs:seeAlso links. In the
original release these were simply used as links to other RDF sources, but
now data can be merged from multiple sources. So if a document contains
little information about a thing, but points to more information, this can
Currently BrownSauce can only browse documents. From the outset, however, the plan was to extend browsing to other sources such as databases with web interfaces. The code is in place and may well be added in the near future.
A related, desirable feature is typed
that you know a book has an entry in a database with a Squish/SOAP
endpoint. In your bibliography you might add:
rdfs:seeAlso <database_endpoint>, <database_endpoint> rdf:type
<squish_soap_class>. BrownSauce could then add data from that
source using the appropriate backend.
This article presents only half the story, but I think it has given you some idea of how BrownSauce works. BrownSauce is free software, available under the same license as Jena. If you want to change it to fit your needs, you can.
AcknowledgmentsI'd like to thank Hewlett Packard Labs, Bristol, which employed me while I wrote BrownSauce and, particularly, the Semantic Web group for its help, as well as for creating Jena, without which my life would have been a great deal harder.