XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Microformats in Context

Microformats in Context

April 26, 2006

There has been a lot of discussion in XML circles as to how far the extensibility revolution promised by XML can take (or has taken) us. Is XML really a tool for creating specialized languages so that information can be expressed in the most natural formats practical? Or is it just a way to reduce the burden on those who write code to consume web content (be strict in what you accept so that you can be liberal with your time spent fly-fishing). Are schema technologies a way to manage the flexibility that XML brings to the table, or just another weapon to put down users ("You don't validate. Go away")? Of course, the way I've posed these questions reveals my bias. I think that XML should be a tool for expressiveness and controlled diversity on the Web. I disagree strongly with the notion, recently expressed in a few quarters, that there are only a few viable XML formats, and that people should stop creating more. At the center of this controversy is the new Web 2.0 hotness: microformats. If you're not already familiar with this phenomenon, first read "What Are Microformats".

It's a DIV's World

Microformats enshrine the idea that rather than creating whole new vocabularies, developers should piggy-back off existing, widely supported and deployed formats such as XHTML. (In this article I'll focus mostly on microformats with XHTML as a host language.) The problem is that XHTML, at its best, does is good for basic document structure but, at its worst, tends to be used for the presentation of documents. Microformats are a lightweight way to express more specialized information within the structure of XHTML without changing its syntax. The idea is that the success of this approach rests on modest (hence "micro") constructs in modules that are mutually independent and focused on very specific domains. Through such simplicity and modularity microformats minimize the strain on the host languages, as well as the implementation effort and overall conceptual load.

Unfortunately, the strain is rarely avoided in practice. Many of the XHTML-based microformats I've seen abuse the semantics of XHTML. a/@rel tends to come in for special abuse. The HTML 4.01 recommendation, whose semantics are adopted by XHTML, says:

This attribute describes the relationship from the current document to the anchor specified by the href attribute. The value of this attribute is a space-separated list of link types.

A microformat, such as Google's rel='nofollow', stretches this definition to breaking. "Don't follow this link" is an instruction to the user agent (more likely an automated agent such as a search index robot). This is related to what was known as "actuation" in the XLink specification and a very different matter from the conceptual relationship between the two documents. I'll hasten to add that these problems are to some extent understood in the microformats camp, and that there are some quite reasonable uses of a/@rel in microformats, including rel-license and rel-tag. Then again there is rel-enclosure, which is still designated a draft but does perpetuate a/@rel abuse without any apology in the spec. The abuse of a/@rev in the vote-links microformats is an even more heinous example. Before you write off my complaints about abuse of existing XHTML constructs as too rarefied and academic, consider that it leads to a very real problem when microformats collide.

Will the Real rel Please Stand Up

There are only so many XHTML attributes to hitch a ride on, and if you can stretch the semantics of each attribute pretty much to suit yourself, it's inevitable that you will need to use clashing microformats. Imagine you have a weblog that automatically asserts rel='nofollow' on comment links to discourage comment spam. An example comment looks as follows.


<p>Nice blog.  Buy your medz <a href='http://medz.com' rel='nofollow'>here</a></p>

But you have another tool that looks for personnel links within your organization and marks them using a colleague designation in the XFN microformat.

<p>I just want to be sure your readers know we're aware of the stability
problems with the latest release.  I've posted some workarounds on
<a href='http://mf-wizards.com/~jdoe/' rel='colleague'>my own blog</a>.</p>

You now have some sorting out to do. Of course you cannot have two rel attributes on the same element. You could set a priority that XFN annotation overrides rel-'nofollow' (this is probably what you'd want in practice), but this means that suddenly your microformats are no longer really independent, and they're certainly not modular. Microformat tools have to be aware of the different specs that might clash, and you introduce a bit of a negative network effect. You could use the NMTOKENS escape hatch, which would mean that after both tools have done their work the comment would look as follows:


<p>I just want to be sure your readers know we're aware of the stability problems with the latest release. I've posted some workarounds on <a href='http://mf-wizards.com/employees/jdoe/' rel='colleague nofollow'>my own blog</a>.</p>

One problem with this is that when you have a microformat such as XFN, which already allows multiple tokens within a/@rel, you're still inviting clashes because it's not clear which tokens are part of XFN, and which come from other conventions. It also becomes a land grab for terms across microformats. XFN defines rel='date' as a statement that you have a romantic involvement with the person represented by the resource indicated by the href. This could make for some stickiness in a microformat for references to calendar resources, where rel='date' would have a markedly different meaning.

U. G. L. Y. You Ain't Got No Alibi...!

Another problem that stems from being restricted to a host language is that you often end up with very contorted and ugly constructs to force the fit. XOXO is an eminent example of this problem. I once did an exploration of XOXO as a language for exchanging weblog lists, rather than the more established, but quite awful, OPML. I ended up with something like Listing 1.

Pages: 1, 2, 3

Next Pagearrow