Introducing RDFa
by Bob DuCharme
|
Pages: 1, 2
Inline Metadata About Components
This category of RDFa use fulfills the original dream that led to RDFa's creation: how to take human-readable web page content and make it machine-readable. For example, the following sentence from the RDF/a Syntax Document describes how Mark Birbeck took a particular picture.
<p>This photo was taken by <span class="author" about="photo1.jpg" property="dc:creator">Mark Birbeck</span>.</p>
The span element and its attribute values let an RDFa-aware tool get the following triple out of this document, shown here in RDF/XML:
<rdf:Description rdf:about="file://C|/dat/xml/rdf/rdfa/photo1.jpg"> <dc:creator>Mark Birbeck</dc:creator> </rdf:Description>
Note the full path added to the photo1.jpg resource name by the RDFa extraction tool that I used. If I had an xml:base value declared, it would have used that to create the full URL.
In that example, the PCDATA string "Mark Birbeck" provided the object of the triple. Sometimes you might want to provide an alternative version of the displayed data, such as a normalized version of a date. In this case, a value in a content attribute will override it:
<p>Last revision of document: <span about="http://www.snee.com/docs/mydoc1.html" property="dc:date" content="20070315T15:32:00">March 15, 2007, at 3:32 PM</span></p>
The resulting triple uses the content value of the date:
<rdf:Description rdf:about="http://www.snee.com/docs/mydoc1.html"> <dc:date>20070315T15:32:00</dc:date> </rdf:Description>
Now, when searching aggregated metadata for a document last updated between 20070312 and 20070318, it will be easy to find the pointer to the document that says it was updated on "March 15, 2007, at 3:32 PM."
Metadata About the Containing Document
Inline metadata about document components was the original use case for RDFa, but its elegant design makes it simple to use for other kinds of metadata, such as metadata about the containing document. While some metadata, such as a document's title and author, is often redundant with existing data in the document, and can be marked up inline as with the examples above, document metadata such as production workflow information can be easily stored in the document header. When no subject is specified, an RDF processor assumes an empty string as the subject, which represents the document itself:
<html xmlns:fm="http://www.foomagazine.com/ns/prod/">
<head>
<title>Is Black the New Black?</title>
<meta property="fm:newsstandDate" content="2006-04-03"/>
<meta property="fm:copyEditor" content="RSelavy"/>
<meta property="fm:copyEdited" content="2006-03-28T10:33:00"/>
</head>
<body>
<!-- body of page... -->
An RDFa extractor gets the following RDF/XML out of this:
<rdf:Description rdf:about=""> <fm:newsstandDate>2006-04-03</fm:newsstandDate> <fm:copyEditor>RSelavy</fm:copyEditor> <fm:copyEdited>2006-03-28T10:33:00</fm:copyEdited> </rdf:Description>
Out-of-Line Metadata About Components
Nesting of meta elements, which is a new feature of XHTML 2, lets you specify a single subject for multiple triples. When you do this in a web page's head element, you can specify specific components of the document as the subject, making it possible to create metadata in your header for individual portions of your web page. For example, a document with multiple recipes in it can include production metadata in the head element about a specific recipe (note that the following sample also uses XHTML 2's new section and h elements):
<html xmlns:fm="http://www.foomagazine.com/ns/prod/">
<head>
<meta about="#recipe13941">
<meta property="fm:ComponentID">XZ3214</meta>
<meta property="fm:ComponentType">Recipe</meta>
<meta property="fm:RecipeID">r003423</meta>
</meta>
</head>
<body>
<h>Add Some Tex Mex Sizzle to Your Kid's Lunch</h>
<section id="recipe22143">
<h>Amigo Corn Dogs</h>
<!-- li, p, etc. -->
</section>
<section id="recipe13941">
<h>EZ Bean Tacos</h>
<!-- li, p, etc. -->
</section>
<!-- more content -->
</body>
</html>
The extracted triples know that this metadata only refers to the element in the document with an ID value of recipe13941:
<rdf:Description rdf:about="file://C|/dat/xml/rdf/rdfa/test5.html#recipe13941"> <fm:ComponentType>Recipe</fm:ComponentType> <fm:RecipeID>r003423</fm:RecipeID> <fm:ComponentID>XZ3214</fm:ComponentID> </rdf:Description>
Because RDFa lets you store a complete triple in an HTML document, you can even store metadata in one HTML document about resources (or portions of resources, like the recipe above, as long as they have an identifier) outside of that document.
Getting Those Triples
Several free tools are already available to extract RDF/XML from a document with RDFa so that you can then feed your triples to semantic web tools or to RDF-aware metadata management tools. Fabien Gandon of INRIA has written an XSLT stylesheet to do this, and Elias Torres has written a web service that only needs the URL of the document with the RDFa triples. Elias implemented this by adding RDFa support to RDFLib (of which I'm a long-time fan). RDFLib can extract embedded RDFa triples and load them into a triplestore in memory or on disk, and then you're off and running for developing an application around your extracted data. Among commercial tools, TopQuadrant's TopBraid Composer includes RDFa support.
The fact that reading RDFa is so easy to implement—you only need a program that can scan a document for certain combinations of a few elements and attributes—means that if no existing RDFa readers can do what you want, you can implement it yourself in any language that provides a reasonable XML parser.
Getting More Out of RDFa
We've seen that RDFa lets you add triples of useful metadata to your XHTML with simple, straightforward markup. It also offers features that let you do even more interesting things with it; in Part 2 of this article, we'll look at how to assign data types to your RDFa values, reification (how to add metadata about your metadata), specifying RDFa metadata about elements with an id attribute, compact URIs, and platforms that make it easier to automate the creation of RDFa metadata. Meanwhile, try adding some RDFa to some documents, play with the RDFa processors mentioned here to extract the metadata, and let me know what you think.
- where should image meta data go?
2007-02-20 14:38:50 t.cowan - Comparison between microformats, RDFa and eRDF
2007-02-19 05:18:04 Vlad_Tanasescu - Microformats?
2007-02-14 12:01:49 EnricoPulatzo - Microformats?
2007-02-17 22:14:34 ix - Microformats?
2007-02-18 04:44:37 Bob DuCharme