XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Introduction to XFML

January 22, 2003

XFML is a simple XML format for exchanging metadata in the form of faceted hierarchies, sometimes called taxonomies. Its basic building blocks are topics, also called categories. XFML won't solve all your metadata needs. It's focused on interchanging faceted classification and indexing data. XFML addresses the following problems with basic hierarchical classification:

  • Creating and maintaining a good topic hierarchy is a lot of work, ask any librarian.
  • Indexing (categorizing) large amounts of content consistently is even harder. See Cory Doctorow's "Metacrap".
  • Creating a centralized hierarchy to organize a large amount of information doesn't scale. (If you think Yahoo's hierarchy scales, ask yourself why you keep turning to Google.)

XFML provides a simple format to share classification and indexing data. It also provides two ways to build connections between topics, information that lets you write clever tools to automate the sharing of indexing efforts. It's based on the principles of faceted classification, addressing many of the scaling issues with simple hierarchies.

What is Faceted Classification?

Facets sound scary and librarian-like, but they are really just a common sense approach to classifying things. Instead of building one huge tree of topics, a faceted classification uses multiple smaller trees (each tree is called a facet) that can then be combined by the user to find things more easily.

Say you're building a travel site about the USA. You could build a hierarchy to browse it that looks something like this:

  • USA
    • New York
      • Bars
        • Blues music
        • Latin music
      • Restaurants
        • Blues music
        • Latin music
    • L.A.
      • Bars
        • Blues music
        • Latin music
      • Restaurants
        • Blues music
        • Latin music

If you're going to New York and want to find a blues bar, browsing this hierarchy will work just fine for you. That's because it's organized by city first, type of place second, and type of music third, which is exactly what you happen to need. But if you're about to visit the USA and want to decide which city to go to based on its blues bars, our classification breaks down. You first want to select your type of music, not your city. Unless there's a good search, you will have to browse every single city looking for blues bars, which is neither elegant nor user friendly.

Combining different types of information (city, type of music, type of place) in one big hierarchy can never address all possible information needs. Faceted classification addresses this problem by providing separate facets that can be combined in the user interface. For example:

City (City is a facet)

  • New York (New York is a topic within the facet City)
  • L.A.

Type of place

  • Bars
  • Restaurants

Type of music

  • Blues
  • Latin

By combining these facets, a user could view all bars in New York, all places that have Latin music throughout the country, or any other combination. Things have suddenly become a lot more interesting. If you want to know what an interface for this can look like, check out Facetmap, a tool that automatically generates four ways of browsing the same faceted classification. You can even upload XFML files to it.

How XFML Works

The XFML core spec gives an introduction, defines the concepts, and specifies the XML format. The spec is stable and frozen, which means you can safely build applications that use it.

An empty XFML Core document looks like this:

<?xml version="1.0" ?>
<xfml version="1.0" url="http://domain.com/xfml/map1.xml" language="en-us">
</xfml>

It's a valid XML document and conforms to the XFML Core DTD. The url attribute is required; it's the URL where the original XFML document can be found. To be nice we add a comment pointing to the XFML Core spec:

<?xml version="1.0" ?>
<xfml version="1.0" url="http://domain.com/xfml/map1.xml" language="en-us">
<!-- This document conforms to XFML Core. See http://purl.oclc.org/NET/xfml/core/ -->
</xfml>

Facets and Topics

The building blocks of a faceted hierarchy in XFML are facets and topics. A facet is the top node of each tree. The nodes in the tree are called topics. XFML can define multiple hierarchies, and each hierarchy is a facet. Our hierarchy expressed in XFML looks like this:

<facet id="city">City</facet>
<facet id="place">Type of place</facet>
<facet id="music">Type of music</facet>
<topic id="ny" facetid="city"><name>New York</name></topic>
<topic id="la" facetid="city"><name>Los Angeles</name></topic>
<topic id="bar" facetid="place"><name>bar</name></topic>
<topic id="restaurant" facetid="place"><name>restaurant</name></topic>
<topic id="blues" facetid="music"><name>blues</name></topic>
<topic id="latin" facetid="music"><name>latin</name></topic>

The reason why topics have a child element called <name> and facets don't is that topics can have other child elements. We'll get to those later. Facet and topic id's are defined in the DTD as id's and therefore cannot contain spaces or start with a number. The facetid attribute for topics is required.

You can add unlimited topic hierarchies within a facet, using the parentTopicid attribute:

<topic id="ny" facetid="city"><name>New York</name></topic>
<topic id="brooklyn" facetid="city" parentTopicid="ny"><name>Brooklyn</name></topic>
<topic id="brooklyn_heights" facetid="city" parentTopicid="brooklyn"><name>Brooklyn Heights</name></topic>

So when do you make a hierarchy of topics become a facet? The spec says, when describing the facet concept, that "[f]acets are mutually exclusive containers that contain hierarchies of topics. Mutually exclusive means that a certain topic can only possibly belong to one facet". The mutual exclusivity requirement is semantic: it can't be (realistically) enforced by software. It means that you should separate out a new facet when you are describing topics that can be usefully combined. Type of music and city are mutually exclusive facets because a topic in type of music (Latin) can never be a topic in city (New York). Note that the mutual exclusivity requirement does not mean that pages (see next section) can only have occurrences in one facet.

Pages: 1, 2

Next Pagearrow