XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Introduction to XFML
by Peter Van Dijck | Pages: 1, 2

Pages

Once you have some facets and topics defined, you will want to classify or index some web pages and add them to your XFML document so your indexing efforts can be shared. You can only classify things that have a URI. Each URI (we call them pages but you can use other filetypes as well) can be classified under multiple topics. The homepage of the B.B. King Blues Club and Grill in New York can be classified under NY, bar and blues topics. We say these topics occur on the page and we call them topic occurrences:

<page url="http://bbkingblues.com/">
<title>B. B. Kind blues club and grill</title>
<description>Conveniently located in the heart of Times Square near Penn Station and Port Authority, The B.B. King Blues Club and Grill offers music fans a unique experience. Owned by the Bensusan Family, proprietors of the world renowned Blue Note Jazz Club, the club features world-class musical talent and consists of two distinct spaces: the Showcase Room and Lucille’s Grill.</description>
<occurrence topicid="bar" />
<occurrence topicid="blues" />
<occurrence topicid="ny" />
</page>

The mapInfo Element

MapInfo is an optional element containing administrative metadata about the map. Usage is simple, check the spec. For our example, mapInfo could look something like this:

<mapInfo>
<managingEditor>
<name>Joe Blogs</name>
<email>feedback@joeblogs.com</email>
<url>http://joeblogs.com/</url>
</managingEditor>
<license>
<name>GNU Free Documentation License</name>
<url>http://www.gnu.org/licenses/fdl.html</url>
</license>
</mapInfo>

The mapInfo element can also contain child elements describing additional editors, a technical contact, the owner of the map, and the software used to generate the map.

Distributed Metadata

What we have so far (facets, topics, pages, and occurrences) lets us build a file that provides some interesting metadata for others to reuse. Typically you will write some code that regularly downloads an updated XFML file from web sites with similar topics to yours, then takes all the topic occurrences that are relevant to your topics and copies those occurrences to your XFML document. That's how you can automate the reuse of indexing efforts.

There is a problem though. If site A wants to reuse the indexing work of site B, they have to use exactly the same topics. That's not how the world works. Site A might have topics "blues" and "latin", and site B might have topics "blues & jazz" and "Latino". They probably mean the same thing, and B might want to reuse the indexing of A, but how can your code know which topic occurrences to reuse?

XFML provides two answers. You can create direct connections between two topics in different maps, indicating that for example the topic "latin" in map A is equal to the topic "Latino" in map B. You can also create implicit connections by pointing a topic to a web page that describes that topic, for example a page with the dictionary definition for Latino. The software can then infer that any topics it finds that point to that same page are really the same topic, no matter what the topic is called.

These two approaches mean that you can create a web of loosely distributed metadata, which is how XFML attempts to address the problems with centralized hierarchies.

Connecting Topics

The first approach to reusing indexing efforts is to connect individual topics between maps. The connect element is a child of the topic element; its content is the concatenation of three strings: the URL of another map, the "#" character, and the id of a topic in that map:

<topic id="latin" facetid="music">
<name>latin</name>
<connect>http://domainb.com/mapb.xml#latino</connect>
</topic>

A topic can contain multiple connect elements.

Published Subject Indicators

The second approach to reusing indexing efforts is to point a topic to a resource on the web that describes it; in other words, to point to a published subject indicator represented by the psi element.

<topic id="latin" facetid="music">
<name>latin</name>
<psi>http://dictionary.reference.com/search?q=latino</psi>
</topic>

A topic can have multiple psi elements. It can even have multiple connect and psi elements: the more psi or connect elements it has, the higher the value of your XFML document. Also note that, once you have established a connection with a topic in another map (through <connect> or a common <psi>), your software can safely copy all of the <psi>'s and <connect>'s from that topic to your topic. Two topics in the same map are not allowed to have the same <psi> or <connect> elements. Some network effects can cause contradictions when automatically copying <connect> or <psi> elements, but those can be resolved by presenting a choice to the administrator when that happens.

Using XFML

Don't try to fit all your internal metadata into the XFML format. It's an export format like RSS, and your database will surely have more fields than XFML can handle. That's okay. If you want a format that can handle (almost) all your metadata, check out Topicmaps or RDF. When programming XFML support into your system, check the processing instructions in the spec. They are just recommendations, however; you may come up with better ways of doing things.

Exporting XFML is easy; often you can just add a template to your content management system and leave it at that. A (somewhat rough) example template for Moveable Type took about half an hour to hack together. Most content management systems don't support faceted classification internally, so you are limited in the richness of metadata you can export. However, you can automatically generate data for facets like date of publication, length of entry, number of comments, and so on; or, if you have categories that don't change often, hardcode the facets and just generate occurrences.

When you make XFML feeds available on your site, indicate them with an XFML button and add a link element in your HTML as described here for auto discovery purposes.

Expect some experimentation when importing XFML and automating indexing work: you'll be traveling in unknown territory. Taxomita is currently the only tool under development that does advanced importing of XFML. However, importing is the cutting edge. This is where you take advantage of the real strength of XFML, namely, distributed metadata. Importing will allow you to use the information in the <connect> and <psi> elements to automatically expand your metadata without resorting to a central list of metadata. We expect exciting things to happen in this area in 2003.

The XFML.org website has a page with tools that support the standard. Livetopics (a plug in for Radio Userland) and Drupal (a content management system) export XFML. Facetmap lets you import and browse XFML files, and Taxomita is an upcoming authoring tool built around XFML. Templates and code libraries are being developed for a variety of environments.

XFML Core (XFML version 1.0) is the first version of XFML. Work is being done on XFML 2.0, but that version won't be finished for at least another year. It may feature elements to describe controlled vocabularies and more ways to distribute metadata. Check the XFML mailing list for the latest developments.

Conclusion

XFML is a simple standard to exchange faceted, hierarchical metadata. What makes it different is the way it addresses specific problems with metadata authoring by allowing for distributed metadata through the <connect> and <psi> elements. It is designed to be easy to code for and is already supported by a number of tools.

To get started with XFML, I recommend writing an XFML file by hand and uploading it to Facetmap. There's nothing like seeing this in action to get your head around the possibilities. After that, try exporting your existing data (if you have a site with some existing metadata) as XFML or play around with some of the available tools.

The XFML site has a page with relevant links to learn more about XFML and faceted classification. Let me just highlight the Faceted Classification mailing list, an excellent (non-techie) list about faceted classification, as well as Mark Pilgrims' Really Understandable Introduction to XFML.



1 to 3 of 3
  1. Namespace
    2003-01-29 16:04:17 peter van
  2. Namespace?
    2003-01-27 05:59:51 Robin Berjon
  3. Related W3C Technology
    2003-01-23 01:14:59 Daniel Zambonini
1 to 3 of 3