|
Great idea for a series, nice piece.
Ok, assuming someone publishes a Web page containing microformat data. If they've followed the best practices on this, they'll have included one or more profile URIs (in the profile attribute of xhtml:head). With this in place, the microformat can be interpreted deterministically and unambiguous, explicit data can be extracted.
The difference between this and scraping is that the publisher and the consumer are both following the microformat spec. In other words, a shared standard.
|