XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Microformats in Context
by Uche Ogbuji | Pages: 1, 2, 3

Listing 4: An XHTML document that uses the XFN microformat and GRDDL

<html xmlns="http://www.w3.org/1999/xhtml">
  <head profile="http://www.w3.org/2003/g/data-view">
    <title>Some Document</title>
    <link rel="transformation"
       href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" />
    <link rel="transformation"
       href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokXFN.xsl" />
  </head>
  <body>
  ...
    <div class='blogroll'>
      <a href="http://chimezie.ogbuji.net/" rel="brother met">Chimezie</a>
    </div>
  ...
  </body>
</html>

The profile prescribes the profile="http://www.w3.org/2003/g/data-view" attribute on the head element so that GRDDL processors know that the document follows the convention. The profile also allows for a number of link elements with rel="transformation", each of which defines a transform from syntax manifested within the XHTML to RDF/XML to be parsed into a model. Listing 4 uses XFN and thus asserts a link to a relevant XSLT transform at http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokXFN.xsl. There is also a transform link to http://www.w3.org/2000/06/dc-extract/dc-extract.xsl, which is not related to any microformat, but rather to XHTML itself. It extracts from XHTML readily-accessible Dublin Core metadata, such as the document title (from the title element) description, creator or date (from corresponding meta elements). This underscores that GRDDL is more general than microformats. In fact, if you use extensions to XHTML rather than a microformat, you can use GRDDL just as well to make the extension, and to extract RDF therefrom.

GRDDL imposes an additional burden on a microformat's specification, namely, an XSLT transform to RDF/XML. This is often not a problem since microformat authors are usually sophisticated, and in the worst case they can get a little help from someone else to write the transform. GRDDL also imposes an additional burden on a microformat's user: the profile attribute and transform links in the document heading. This is more problematic since most web authors hate to worry about such details. The idea of GRDDL profiles would help solve the discovery and semantic issues of microformats, although it would be nice to see other sorts of links, such as to the schema or even to the specification of a microformat, which GRDDL doesn't explicitly address at present. It remains to be seen whether web authors can bear the burden of profile information in document headers. Since asserting these links is such a straightforward matter of syntax, it is probably a case of whether GRDDL advocates can convince tool vendors to make the small necessary tweaks.

A More Radical Departure

GRDDL is designed to play nicely with host formats, microformats, whole-sale extensions and just about anything one can cook up in the syntax. RDF/A is a related initiative but represents a more radical departure from microformats. It actually predates microformats and GRDDL. It started out as an RDF syntax that would be more friendly to web authors because it is expressed in XHTML. It has recently changed its name, some of its focus, and has gained a good bit of steam indirectly from the microformats buzz. While you can think of GRDDL as a bridge from microformats to RDF, you can think of RDF/A as microformats done the RDF way in the first place. The rel-license microformat specifies that a link is specifically to the license for the source document.

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Some Document</title>
  </head>
  <body>
  ...
    <p>This document is licensed under a
<a rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/">
  Creative Commons Non-Commercial License
</a>.
    </p>
  ...
  </body>
</html>

It takes a fairly light change to turn this into RDF/A

<html
  xmlns="http://www.w3.org/1999/xhtml"
  xmlns:cc="http://creativecommons.org/licenses/">
  <head>
    <title>Some Document</title>
  </head>
  <body>
  ...
    <p>This document is licensed under a
<a rel="cc:license" href="http://creativecommons.org/licenses/by-nc/2.5/">
  Creative Commons Non-Commercial License
</a>.
    </p>
  ...
  </body>
</html>

rather than rel="license" it's rel="cc:license", with the prefix mapping to the added namespace declaration http://creativecommons.org/licenses/. This is another example of the problem-filled practice of QNames in content, but it's based on the RDF/XML legacy and is used to construct RDF predicate links much as such QName constructs are used in RDF/XML.

The qualification of the license relationship in this way provides for discovery and semantic precision. The namespace can be treated as a link and dereferenced to get more information about the usage, and this link relationship would not be confused with any other sort of relationship. The main syntactic problem that afflicts microformats also dogs RDF/A, however. By stretching RDF to fit an XHTML skeleton, the result can be quite ugly. If you care at all about XML design, or even about plain transparency, you should be prepared to do a lot of wincing while going through the examples in the RDF/A primer.

As I said, it was right for microformats to start by worrying primarily about syntax with semantics communicated informally. I do think that as microformats take off more, people will start to miss the sorts of help with interchange and transform that can come with more formalized semantics. Microformats look to codify small islands of relatively informal context, whereas GRDDL and RDF/A look to aggregate these islands into distributed models to form the basis of a Semantic Web. It would be nice to have a schema-driven intermediate to these ideas that would allow annotations of the meaning of microformats constructs (Schematron springs to mind as a very fruitful technology in this context), providing processing support if not aggregation, which could then be delegated to a separate RDF layer (perhaps through GRDDL).

Form Is Function

Just as I was wrapping up the first draft of this article, Norm Walsh wrote a weblog entry in which he provided some thought experiments on a means for validating microformats. He believes that "[the validation] problem has to be solved before microformats can be considered a reliable way to encode data." I agree and it's a very interesting read. It's especially interesting in the way it echoes some of my own points above. First of all, to make the document structure more accessible for validation, he wrote a transform to turn the tokens hidden in class attributes and such into the generic identifiers of the XML tags themselves. This is related to my point that microformats' reliance on structure hidden in attributes makes processing more difficult than XML should be. At least one commenter noted how much of an improvement the transformed content was, which echoes my points about readability. Norm also had to contend with cases of semantic clash between constructs in different microformats (in this case, even between two formats created by the same author). His article focuses on validation of syntax, rather than its expressiveness, as I do in this article.

G. Ken Holman pointed out to me in private mail that the new standard ISO/IEC 19757-4 Namespace-based Validation Dispatching Language (NVDL) promotes the use of micro vocabularies (small, specialized XML formats). You can embed these in a host language and use NVDL to declare how validation is dispatched to different schemata based on namespace or other patterns. Ken has a good synopsis of this approach in this message on the UBL list.

It's too bad that microformats and RDF/A degenerate to such awful XML design in non-trivial use cases. Good XML design is not just a concern for purists. Readability and transparency matter, and they are fundamental goals of XML. XML support in browsers is just becoming respectable enough to use the technologies as they were meant to be used. There is really no practical reason why modules of specialized XML with associated modules of CSS could not be used with host languages. There is the problem that host languages are not always readily extensible; in the case of XHTML, to be technically correct you would have to go through the significant trouble of creating a DTD module that meets stringent standards. In practice, however, not much validation is done on the Web. If we could mix in profiles from GRDDL to support discovery, and beef the idea up so that one can express more types of links than transforms to RDF, there would be a solid bridge to semi-automated processing. Such a combination might be a real sweet spot where communities of practice can share modest and highly focused conventions while still propagating high-quality markup. It would require hardly anything in the way of new technology. It would just be a matter of top-notch salesmanship to the user community, something in which the microformats revolution has offered a great lesson.



1 to 1 of 1
  1. #1 Carpet Cleaning Los Angeles Non-toxic Cleaning 1-323-678-2704
    2009-06-11 14:55:20 whats
1 to 1 of 1