Microformats and Web 2.0
by Micah Dubinko
|
Pages: 1, 2
Microformats Community and Process
In that earlier XML.com article, I dipped my toe into the waters of microformat design, proposing a format called the Exam Markup Language, or Examl. The first time most folks ever heard of it was reading the column. Turns out that was a mistake.
A microformats wiki page titled, So you wanna develop a new microformat? lists the steps one should follow. Note the emphasis on transparency throughout:
- Document current behavior
- Propose
- Iterate
Before starting work on a microformat, a fair amount of research and discussion needs to happen, generally in public. If only one person has ever worked in it, a microformat probably won't succeed. Research done, the next step is to propose the format in an appropriate forum, seeking copious feedback. Repeat as necessary.
The wiki asks two further guiding questions:
- If I looked at this microformat in a browser that didn't support CSS or had CSS turned off, would it still be human readable?
- Are this format's elements stylable with CSS?
To put it another way, those annoying non-semantic elements wither and fade under the scrutiny of the microformats community. This outlook enforces a proper view of markup as an intention-carrying component, not a presentational shortcut.
While I'm confessing past sins, I also wrote that "some gray areas remain. For example, is RSS a microformat? It seems to bear at least some of the characteristics of one..." It is true that RSS bears some characteristics, but analysis since that article has concluded that RSS is definitely not considered a microformat.
Most folks, though, won't need to create a microformat--they can use an existing one, secure in the knowledge of how much consideration has gone into its creation.
Influence on the Web
One possible objection is that microformats encourage "screen scraping"; instead of using a carefully crafted Web Services API, people (and their machines) will instead fetch regular pages and struggle on from there.
I asked Casey West about this. He noted that search engine crawlers will, except in very special cases, always prefer to enter a site through the same interface used by regular web browsers, because a single search engine would never be able (or even want) to keep up with all the possible third party APIs that might exist at any given time. In other words, microformats are a natural companion to REST-philosophy web services where useful data is only a GET request away. Microformats are human readable, but not at the expense of machine readability. Thus, it's not exactly fair to say one would have to "scrape" from a page with microformat data present--the data is structured and accessible by design. In other words, microformats tend to work better on the web.
A closely related question is, what kind of effect might microformats have on browsers and Web 2.0 applications that run in them? I liked West's answer, that basically they don't need to change beyond support for XHTML. Çelik added that a key principle involved is users controlling their own data. In several of his presentations, he asks the audience how many different email clients they've had over their lifetimes. How did the data migration go at each step? Not too well, usually, but intelligent use of microformats could perhaps improve the situation. This especially goes for calendar and address book applications, where existing microformat work is well-established.
Microformat Annoyances
Like any new technology, microformats don't solve every problem, and in fact introduce a few problems of their own. One is the general problem of microcontent, that is, useful units of data at a granularity less than that of individual documents. Many existing content management systems aren't equipped to deal with, say, a single XHTML document that contains 27 hCard instances. As microformats gain prominence, though, microcontent management systems should begin to catch up.
Presently, microformat progress is almost exclusively based on XHTML. Depending on your viewpoint, this may be a strength or a weakness. We'll get to possible alternatives in a bit. In some ways, the microformats movement and community competes with consortia-based standards development, which is slower to adjust to a new, less expansionary era. On the other hand, XHTML 2.0 shows all the signs of being an excellent microformat foundation--if and when it becomes supported by browsers.
As with any highly intentional language, working with microformats can sometimes be painful; the urge to insert presentational tags can be overpowering. For this reason, working with microformats requires eventually requires in-depth knowledge of XHTML, CSS, and other XML best practices. Any shortfall on these skills can make it hard to understand why certain things are done the way they are, and how to effectively make use of existing tools. Fortunately, the learning curve is not too steep, and the new skills can be added in an incremental, as-needed basis.
Lastly, standards developed as a microformat exist in a more constrained environment--new elements and attributes in general can't be created as needed. This can make versioning, already a hard problem, even worse. Existing microformats are young enough--and focused enough on solving a single small problem--that versioning hasn't become a serious problem. This will be an area to keep close tabs on in the future.
Things to Watch
The new generation of browsers finally supports more than just HTML. Will new microformats arise around SVG, XForms, or other existing markup languages? It's an open question.
Another question is how tightly microformats are (or need to be) bound to browsers. Many instances of full-scale XML vocabulary development fall outside browsers. In any of these cases, would it make sense to apply the microformat treatment to, say Docbook, OpenDocument, or UBL? Time and community interest will tell.
One more thing to keep an eye on: what is Mark Pilgrim up to? "Or do you just use your browser to browse? That's so 20th century."
The Bottom Line
Vocabulary proliferation is one of the biggest XML annoyances around. If you're like me, your brain can hold three, maybe four markup languages at a time. The microformats way of life prefers reusing existing work wherever possible. Recycled knowledge goes a long way. An active community works to continue progress on specifications, which tend to be easier reading than full-blown committee standards.
RSS is pretty successful today, but it took nearly nine years to get there. In a universe where, instead of RSS, an equivalent microformat started things off, would adoption have happened more quickly?
If you think the answer to that question might be "yes," then microformats are worth a look.
What do you find most annoying about working with XML? Talk back, and
your idea may be used in a future column.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- icckkyyy
2005-11-01 14:22:17 vdubberly [Reply]
First, nice article.
Second. This just feels dirty, like a cheap ( but ultimately expensive ) hack to keep me from having to write a secondary XSLT pipeline.
Perhaps the better way would be to extend the XHTML spec to deal with reasonably with inlined XML most browsers display inline XML perfectly when formatted with CSS. It's just not "legal" to write things that way.
- Me
There is a good reason that M and V are distinct objects in MVC. Let's keep them that way.
- Important techniques for word processors
2005-10-27 13:55:08 PeterSefton [Reply]
I think microformats also have a future in the world of word processors where there are a limited number of techniques for capturing structured content on top of the generic file formats that lie underneath.
For example, at the University of Southern Queensland we are working on ways to capture quiz content in a word processor (both OpenOffice.org Writer and Microsoft Word), using tables and styles, applied using autotext and then transform to the relevant XML representation (QTI) in the content mangement system.
- In response to Norm
2005-10-21 14:47:28 Danny Ayers [Reply]
Validation may be difficult, but how important is it in typical microformat scenarios on the Web? I appreciate that this is machine-readable data we're talking about, so liberal browser-like display may not be an option. But some client liberality will almost certainly have to be considered, but hopefully not to the extent of RSS (and how many aggregators have validation in their pipeline?).
For my own applications I plan to pre-clean pages with Tidy before ahead of XSLT (to RDF/XML, as Daniel suggests). I anticipate a proportion of junk data, but will cross that bridge when I get to it.
- In response to Norm
2005-10-31 07:48:21 BrunoVernay [Reply]
Validating is important when writing : not when reading.
- In response to Norm
- why the limit?
2005-10-20 12:43:54 bryan rasmussen [Reply]
"Vocabulary proliferation is one of the biggest XML annoyances around. If you're like me, your brain can hold three, maybe four markup languages at a time."
Why would this be? Is there a limitation on the number of programming languages? I would suppose there might be a basic limit of the number of tasks of a similar nature that a human can relate to and this format brainshare is a case of that, must be some studies somewhere. Is it hold three, four markup languages without needing to look up the exact meaning of a tag, or not needing such tools as autocompletion in editors to remind one of the tags available here?
I actually think that comparing markup languages at this level is probably not worthwhile, I think that XSL-FO and RSS are of such very different types of languages that the ability to know one does not significantly affect the ability to know another.
(I actually posted this to Danny's blog first)
- What about validation?
2005-10-20 10:21:32 Norman Walsh [Reply]
I think you missed a really important issue. In my own exploration of microformats (see http://norman.walsh.name/2005/09/05/microformats) I was disappointed to discover just how hard it is to validate microformatted content. It'd all be much nicer if the push was towards supporting small vocabularies that could be easily validated instead of reinventing architectural forms. But there's no question that microformats are useful.
- What about validation?
2005-10-20 12:44:35 bryan rasmussen [Reply]
I think you miss how easy it is to validate microformats using schematron.
- What about validation?
- Great article
2005-10-20 04:46:35 Daniel Zambonini [Reply]
As usual, great article.
It's also worth clarifying that although the current trend for microformats is to use CSS to make them 'usable', there's also the possibility (when used in XHTML) to XSLT them into other formats: you could easily create RDF or SVG out of the original XHTML marked-up data.
- Scraping
2005-10-20 04:02:22 Danny Ayers [Reply]
Great idea for a series, nice piece.
Ok, assuming someone publishes a Web page containing microformat data. If they've followed the best practices on this, they'll have included one or more profile URIs (in the profile attribute of xhtml:head). With this in place, the microformat can be interpreted deterministically and unambiguous, explicit data can be extracted.
The difference between this and scraping is that the publisher and the consumer are both following the microformat spec. In other words, a shared standard.
