Introducing RDFa, Part Two
April 4, 2007
In part 1 of this
article, we saw that RDFa, a new syntax for representing RDF triples, can be embedded
arbitrary XML documents more easily than RDF/XML. RDFa is particularly good for embedding
these triples into XHTML 2, which has a few new attributes that make it easier to
Part 1 of this article showed several roles that RDFa metadata can play, describing
about the containing document and metadata about individual elements within the document.
also saw how RDFa can represent triples that use existing web page content as their
and triples that specify new objects, which are useful for adding workflow metadata
document or for specifying normalized values such as
"2007-04-23" as metadata
associated with a date displayed on a web page as "April 23, 2007". This article shows
to use RDFa to express additional, richer metadata, and we'll explore some ideas to
the generation of RDFa markup.
One classic bit of metadata to add to a piece of data is an indication of that data's
you use datatypes from XML Schema Part 2, a spec that offers choices for most of the typical types you'll
find in a programming language or database package. To add a datatype to the kind
markup that we saw in part 1 of this article, you simply add a
For example, let's say you want to identify the types of the values in the following HTML table:
Because each row of the table is about a particular shipment of widgets, the first
when adding RDFa triples that describe the shipments is the addition of an
about attribute to each row to name the subject of the triples for that
span element around each value in the row can include a
property attribute to show what property that value indicates for that row's
shipment, as shown in the source below.
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.foobarco.com/ns/vocab#" xmlns:fbi="http://www.foobarco.com/ns/ID#" xmlns:xs="http://www.w3.org/2001/XMLSchema#"> <!-- head element, start of body and table... --> <tr about="[fbi:x432]" > <td><span property="fb:shipmentID">x432</span></td> <td><span property="fb:date" datatype="xs:date">2007-04-23</span></td> <td><span property="fb:amount" datatype="xs:integer">34</span></td> <td><span property="fb:anodized" datatype="xs:boolean" content="true">yes</span></td> </tr> <!-- remaining rows of table... -->
Nearly all of this should be familiar from part 1 of this article. The only new bit
syntax in the RDFa-enhanced HTML of the table is the
datatype attribute, which
identifies the type of the value inside the
span element. Now, an RDFa
extraction routine can get these values and pass them along to an application that
more with typed data than it can with a collection of strings. Also note that, to
the use of the
content attribute described in part 1, each row's last
td element includes one of these attributes, with a value of "true" or
"false" instead of "yes" or "no", which are not valid Boolean values. This way, the
extractor will see proper Boolean values for the triples describing whether each widget
shipment is anodized.
With all the RDFa markup added, it may look verbose, but this kind of tabular representation of data is usually automatically generated from backend relational databases anyway. It wouldn't be much trouble to have the HTML generation routines add these extra attributes, thereby making the data more valuable to other applications. Below, we'll find out about some other applications that, because they generate HTML from templates, are excellent platforms for generating lots of useful RDFa markup with minimal trouble.
The rev Attribute
In part 1 of this article, we saw that RDFa can use the
a element's venerable
rel attribute to indicate a resource's relationship to another
resource—or, in RDF terms, to serve as the predicate of a triple, with the
about attribute naming the subject and the
href attribute naming
the object. The
a element's even less-used
rev attribute expresses
the opposite: a triple in which the
href attribute names the subject and the
about attribute of the element (or of the nearest ancestor with one) names
rev can go in the same element to describe two
different relationships, such as the following one showing that the Supreme Court
versus Board of Education overturned Plessy versus Ferguson:
<span about="http://caselaw.lp.findlaw.com/scripts/getcase.pl?court=US&vol=347&invol=483" rel="fb:overturns" rev="fb:overturnedBy" href="http://caselaw.lp.findlaw.com/cgi-bin/getcase.pl?court=us&vol=163&invol=537"/>
(In a system using OWL, you wouldn't really have these
rev attributes in the same element. You'd just have one, and a separate rule
fb:overturnedBy are inverse
properties. This way, either could be inferred from the other.) If an ontology doesn't
include the relationship that you want to specify but does offer its inverse, the
rev gives you flexibility that can be especially
valuable if you're representing a relationship between a resource you can edit and
can't. For example, if your ontology has
fb:overturnedBy but not
fb:overturns, you could add the following metadata to the document at
<span rev="fb:overturnedBy" href="http://caselaw.lp.findlaw.com/cgi-bin/getcase.pl?court=us&vol=163&invol=537"/>
The lack of an
about attribute (assuming that no ancestor element has one
either) indicates that the document itself is the object of the triple: Supreme Court
347 US 483 overturned 163 US 537.
We use URIs to qualify names in order to make their context absolutely clear—for
example, to show that one use of the word "title" comes from the Dublin Core namespace,
therefore refers to a published work, while another might come from a real estate
namespace and therefore refer to a deed to property. Writing out full URIs with each
(for example, http://purl.org/dc/elements/1.1/title) can make things pretty verbose,
namespace declaration such as
us use a prefix that will stand in for the URL that identifies a name's namespace.
us use shorter versions of our names while still being clear where they came from.
We call a
name such as
dc:title a qualified name, or qname.
Qnames used in attribute values can lead to problems, because not all processing programs know that they should compare the prefix with the namespace declarations to see which namespace the name really comes from. It has worked out fine for XSLT use, because XSLT processors all know that qnames represent elements in source documents, but this has led to a problem in RDF use, because RDF uses URIs to identify the namespace of values as well as namespaces of elements and attributes.
For example, if I can set the standards for URL patterns at FooBar Company, and I
represent employee number 4942 as
there's no problem so far. If I say
xmlns:fb="http://www.foobarco.com/ns/empID#", there's still no problem, but
there is a problem if I represent the employee as
"fb:4942", because it doesn't
conform to the qname spec. Qnames were designed around XML names, or the names that
allowed to make up for elements and attributes, and those names must begin with a
So, to keep the use of namespace prefixes instead of full URIs legal with the existing
specs, we can't use them with values that begin with a numeric digit.
Lots of important values begin with numeric digits. Besides employee IDs and other
numbers, the CURIE
Working Draft points out that International Press Telecommunications Council metadata
often begins with a digit. To address this, we now have a new URL abbreviation syntax
as the Compact URI, or CURIE, syntax. CURIEs are pretty much like URIs with looser
what comes after the colon: you can use any character that can be in a URI. (One handy
corollary of this is that qnames are valid CURIEs.) Just about the only bit of new
learn for using CURIEs is the square brackets that go around a CURIE value when used
URIs are also allowed, such as in an
<tr about="[fbi:x432]" > <td><span property="fb:shipmentID">x432</span></td> <td><span property="fb:date" datatype="xs:date">2007-04-23</span></td> <td><span property="fb:amount" datatype="xs:integer">34</span></td> <td><span property="fb:anodized" datatype="xs:boolean" content="true">yes</span></td> </tr>
The square brackets seem to be a nod to the syntax used to represent links in wikis. The example above would work exactly the same without the square brackets. But now, if you ever see square brackets, you'll know why they're there.
Reification (Sort of)
Reification is the assignment of metadata to metadata. This sounds pretty abstract, but if you consider that metadata is data to track, just like any other, it's easier to see the value of reification. For example, if a document has an RDF triple saying, "this document was created by Richard Mutt," another triple saying that the triple about the document's creator was created on 2007-04-19 would be metadata about that metadata.
RDFa's designers had reification on the original list of RDF features that RDFa would eventually be able to represent, but they're having second thoughts, and the latest version of the RDFa Primer no longer mentions it. The plan for RDFa was always to make it a subset of RDF, and reification may not make the cut. (XML came to exist via a similar cutting out of potentially complex and confusing features, as its designers were creating a subset of SGML.) Still, I couldn't resist demonstrating a reification-like technique with RDFa that can be useful in web or other hypertext applications.
a linking element describes a relationship between the document
a element and the resource that it points to. If you're really
interested in tracking metadata about your hypertext links, you can add an
about attribute to the
a element and add empty
element children, as shown here, to store metadata about the linking element.
<p>Mr. Breakfast has a nice <a about="link23" href="http://www.mrbreakfast.com/article.asp?articleid=17"> <span property="fb:addedBy" content="BD"/> <span property="fb:lastChecked" content="2007-03-15"/> scrambled eggs recipe</a>.</p>
This is not really reification because it's not metadata about metadata. In this case,
metadata about a specific HTML element: the
a element with an
about value of
"link23", which happens to link to another
element. It's still useful, and may whet your appetite for proper reification as a
of more full-featured RDF syntaxes.
Showing Some Class
In addition to specifying properties and values of a resource, RDFa can identify the
resource as an individual of a particular class. When you have an ontology of information
about a set of classes, you have additional information about individuals of those
so knowing an individual's class membership lets you do more with it. For example,
know that a resource is a
widgetShipment, ontology information about this class
may have relevant storage and safety information.
This is a nice example of RDFa building on an obvious bit of HTML syntax to add some
power: you simply use the
class attribute, which has been around since HTML 2.0.
For example, the
class attribute in the following example tells us that the
fbi:xbi432 resource is an individual of the
<tr about="[fbi:x432]" class="fb:widgetShipment"> <td><span property="fb:shipmentID">x432</span></td> <td><span property="fb:date" datatype="xs:date">2007-04-23</span></td> <td><span property="fb:amount" datatype="xs:integer">34</span></td> </tr>
Extracting the triples and converting them to RDF/XML would result in something like this:
<fb:widgetShipment rdf:about="http://www.foobarco.com/ns/ID#x432"> <fb:anodized rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">true</fb:anodized> <fb:amount rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">34</fb:amount> <fb:date rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2007-04-23</fb:date> </fb:widgetShipment>
(Because this is a newer aspect of RDFa, no RDFa extractors support it as of this writing, but I'm looking forward to it being supported in the future.)
Auto-Generation of RDFa Metadata
All of my examples so far have been hand-coded, but when you consider the huge infrastructure of HTML-generating systems, it's not difficult to find opportunities for automatically generating large amounts of useful, machine-readable RDF triples inside of web pages. Templating languages typically give you a way to add HTML (or, if you prefer, XHTML) markup around the templating language's codes that indicates which values to plug in from another data source.
For example, the rhtml template files of a Ruby on Rails application let you specify
markup for one row of an HTML table, and then tell the Ruby interpreter to generate
with that markup for each row of a table retrieved as part of a database query. You
about attributes and
span wrapper elements to the table markup
as easily as you can add
td elements and
align attributes, and
pretty soon your Ruby on Rails application is automatically generating triples of
machine-readable typed values similar to those in the widget shipment table shown
same principle works with PHP scripts, Active Server Pages, and HTML generated by
Weblogging platforms also provide customizable templates to control the HTML that
generate. My host provider offers Movable Type as a weblogging platform, so I've been
it for a few years. When I insert RDFa markup into a template with Movable Type
tags such as
<$MTSubCategoryPath$> inside that markup, the Movable Type engine
replaces its tags with the appropriate values for each weblog entry page being generated.
For example, I added some RDFa markup with Movable Type tags in the
section of the template, like this:
<meta about= "<$MTEntryPermalink$>"> <link rel="trackback:ping" href="http://madskills.com/public/xml/rss/module/trackback/"/> <link rel="dc:identifier" href="<$MTEntryPermalink$>"/> <link rel="dc:subject" href='http://www.snee.com/bobdc.blog/<$MTSubCategoryPath$>'/> </meta>
and I wrapped some
span elements around
body content, like
<h3 class="entry-header"><span property="dc:title"><$MTEntryTitle$></span></h3>
For one recent weblog entry, Moveable Type generated this for the header:
<meta about= "http://www.snee.com/bobdc.blog/2007/03/new_eric_van_der_vlist_book_on.html"> <link rel="trackback:ping" href="http://madskills.com/public/xml/rss/module/trackback/"/> <link rel="dc:identifier" href="http://www.snee.com/bobdc.blog/2007/03/new_eric_van_der_vlist_book_on.html"/> <link rel="dc:subject" href='http://www.snee.com/bobdc.blog/xml'/> </meta>
and it generated this for the
h3 part shown
<h3 class="entry-header"><span property="dc:title">New Eric van der Vlist book on Schematron out</span></h3>
An RDFa extractor gets (among other triples) the following RDF out of the document, shown here in RDF/XML:
<rdf:Description rdf:about="http://www.snee.com/bobdc.blog/2007/03/new_eric_van_der_vlist_book_on.html"> <trackback:ping rdf:resource="http://madskills.com/public/xml/rss/module/trackback/"/> <dc:subject rdf:resource="http://www.snee.com/bobdc.blog/xml"/> <dc:identifier rdf:resource="http://www.snee.com/bobdc.blog/2007/03/new_eric_van_der_vlist_book_on.html"/> <dc:title rdf:datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">New Eric van der Vlist book on Schematron out</dc:title> </rdf:Description>
Movable Type creates the RDFa I've shown here for each new file that it creates. And, for that matter, for each old file that it creates as well, because it's easy enough to tell Movable Type to regenerate all of them. So shortly after I made this change to the template, I had nice RDFa metadata in all the weblog entries I'd ever written on this system. To harvest that metadata, I could use a script with a single wget or curl call for each weblog entry to combine that metadata into a single file, and then I could create specialized tables of contents, reports, Topic Maps, and other applications around this content collection.
Whenever you see HTML being generated automatically, you have an opportunity to create RDFa. Movie timetables, price lists, and so many other web pages where we look up information are generated from a backend database. This is fertile ground for easy RDFa generation, which could make RDFa's ease of incorporating proper RDF triples into straightforward HTML one of the great milestones in the building of the semantic web.