XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Microformats in Context
by Uche Ogbuji | Pages: 1, 2, 3

Listing 1: XOXO example of a weblogs list

<ol class="xoxo">
  <li>
    <p>Technology</p>
    <ol>
      <li>
        <ul>
          <li>
            <a href="http://weblog.foo" type="text/html">Weblog home</a>
            <a href="http://weblog.foo/atom" type="application/atom+xml">Web feed</a>
            <dl>
              <dt>description</dt>
              <dd>That good ole Weblog</dd>
            </dl>
          </li>
        </ul>
      </li>
    </ol>
  </li>
</ol>

XHTML is not really designed for expressing lists of feeds, so XOXO ends up having to layer on the XHTML scaffolding rather thickly. The result is verbose and hard to read. I track a chamber of XML horrors I've found in my consulting, and one very common absurdity is what I call "markup indirection." Developers sometimes choose to ignore the basic extensibility of XML and design formats where the structure is completely generic, and all the markup essentially becomes content. The usual suspect is just a bloated translation of a CSV file.

<product>
  <property>
    <name>ID</name>
    <value>xyz123</value>
  </property>
</product>

rather than <product xml:id='xyz123'/>. The ultimate reduction of this absurdity is <element name="description">... rather than <description>.... Amazingly enough, XOXO goes one step worse than this joke in the pattern:

<dl>
  <dt>description</dt>
  <dd>My favorite Weblog</dd>
</dl>

The above cries out to be written instead as <description>My favorite Weblog<description>. Beyond the ugliness, another problem with markup indirection is that you're fighting against the design of XML and against general-purpose tools that are designed to look for the keys to structure in elements, and not squirreled away in content. Markup indirection also makes processing harder, and this is a common problem that I see with microformats. Eve Maler pointed out to me in a private discussion that this has been an endemic problem from the early days of SGML and it stems from a perception of false economy where people think fewer tags means less burden.

If, instead of XHTML, I start with XBEL, an XML vocabulary that is designed for expressing lists of links I end up with a much more attractive result, Listing 2.

Listing 2: Translation of Listing 1 (a weblogs list) to XBEL with extensions

<folder>
  <title>Technology</title>
  <bookmark href="http://weblog.foo">
    <title>Example Weblog</title>
    <info>
      <metadata owner="webfeeds">
        <link href="http://weblog.foo/atom" type="application/atom+xml"/>
        <description>That good ole Weblog</description>
      </metadata>
    </info>
  </bookmark>
</folder>

I can do even better if I create a specific vocabulary for weblogs, as in Listing 3.

Listing 3: Translation of Listing 1 (a weblogs list) to a custom XML format

<folder>
  <title>Technology</title>
  <weblog href="http://weblog.foo">
    <title>Example Weblog</title>
    <webfeed href="http://weblog.foo/atom" type="application/atom+xml"/>
    <description>That good ole Weblog</description>
  </weblog>
</folder>

The kicker, of course, is that since I'm using XML as it was intended to be used, it's an easy transform from Listing 3 or 2 to Listing 1, either to that explicit XHTML, using XSLT, or to the equivalent presentation using CSS. Almost all modern browsers support at least XML and CSS so this would be transparent to end users.

So, does pretty matter? Interestingly enough, the microformats community heavily overlaps the view-source philosophy, which holds that the best Web standards are simple and transparent so that a developer or content author can simply find a page to emulate, view the page source, and imitate the constructs. The paradox is that the limitations of microformats mean that beyond the simplest uses they tend to be ungainly, and thus fail the view-source test. Truly specialized formats, or at least proper extensions to existing formats are generally much easier to comprehend by casually inspecting the markup.

A Search for Meaning

One problem that the microformats technique doesn't address at all is auto-discovery of semantics. You learn the meaning of the conventions in a microformat by reading the format specification. There are no shortcuts. If you come across a pattern in a host format that looks suspiciously like a microformat, you have no way of knowing what the microformat is for, and what its rules are unless you do some sleuthing with the help of your favorite search engine and find the spec. Even once you find the spec you almost always get an informal description of the convention. You don't often get a schema, and you almost never get a schema structured enough to help automate processing of the format.

This is one limitation that I think is the right choice for microformats. Discovery and semantics are very hard problems, and microformats would never have got off the ground trying to solve them any more than XML would have trying to solve the problem of semantic transparency as well as syntactic transparency. Microformats are rooted strictly to the syntactic realm, and those who do need more formality and structure can build these on the basics. In this article, I am sticking as much as possible to syntactic considerations with respect to microformats, but some of these considerations are related to semantics and are informed by how semantics might be mixed into microformats.

The leading effort along these lines is Gleaning Resource Descriptions from Dialects of Languages (GRDDL). GRDDL is an initiative (undertaken mostly by W3C staffers) to bind microformats to RDF models. It's especially interesting because it hinges on a simple idea that my colleagues at Fourthought came up with four years ago and provided as a feature in the 4Suite server and repository (Eric van der Vlist was independently pursuing similar notions at about the same time). The idea is to use XSLT transforms to transform plain old XML to RDF/XML, thus creating a binding from syntax to formal semantic model. But the most important contribution by GRDDL is that of the profile, a convention for a host language that expresses URIs to assert which microformats are actually used in a document instance. The GRDDL profile for XHTML prescribes usage as in Listing 4.

Pages: 1, 2, 3

Next Pagearrow