XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Formal Taxonomies for the U.S. Government

January 26, 2005

Taxonomies have long played a central role in both medical and library science for the classification of medical terms and books. Recently, the U.S. federal government's Office of Management and Budget (OMB) released the Federal Enterprise Architecture (FEA) Data Reference Model (DRM). The FEA DRM specifies three abstract layers of an organization's information: business context, information exchange, and data element description. Business context specifies the use of a taxonomy to categorize government information. One definition of a taxonomy is "a scheme that partitions a body of knowledge and defines the relationships among the pieces. It is used for classifying and understanding the body of knowledge."

For federal agencies trying to learn how to implement taxonomies, most examples in portals and on public websites are informal taxonomies where neither the nodes nor the associations between them are formally defined. Examples of such taxonomies can be found on yahoo.com, froogle.com, and dmoz.org. Such informal taxonomies are only useful for browsing and not for automated techniques like query expansion, rule execution, taxonomy integration, faceted classification, and inference. This article will examine the requirements of formal taxonomies and provide examples of each construct.

Defining the Dots

The requirements for a formal taxonomy can be divided into two broad categories: node definition and link definition. A taxonomic node is a category. A node in a taxonomy cannot just be a label because labels have no predefined semantics and are inherently ambiguous; however, after defining a semantic structure for your taxonomy node, it is good practice to attach multiple synonymous labels as is supported by both the XML Topic Map Standard (XTM) and the W3C Web Ontology Language (OWL). At a minimum, you should choose one of three modeling constructs for a taxonomic node: a Collection, a Class, or an Instance. Here are definitions and examples of each construct:

Collection: A collection is any grouping of items into a single container. The membership criterion for the group is just being in the container. In other words, membership is expressed by enumerating the items in the collection. In set theory, this is called the extension of the set. A key differentiator for a collection is that it may contain heterogeneous items, whereas a class contains only homogeneous items as specified by the class constraints (OWL's anonymous classes blur this distinction in an elegant manner). Since a collection has no way to define formal membership criteria, membership is restricted to only its current contents.

Thus, for categorization it is a much weaker construct than formal class definition. In the FEA DRM, such a "collection category" is called a "subject area"; this is also allowed in a Topic Map by attaching occurrences to a topic and in UML as an aggregation. In the language of set theory, we would say that there is no way to calculate the "intension" of the set. That is because it has no intension. A very common example of a "collection category" is "Favorites" where the membership is purely subjective. The most common non-subjective example of this is items that are members of a "collection category" because they are "part of" the parent taxonomic node. We will discuss the part of association later, but for now, it is important to understand that the parts of a system are not members by intention but by extension. This may change as our powers of representation improve, but for now, a "collection category" is the best option to represent them and should not be excluded from a formal taxonomy.

It is worth noting that while this is a very common construct in programming languages (i.e. the Java Collection classes), it has a variety of different implementations and perspectives in the Knowledge Representation literature. Some diverse examples of where you can find collections are the Dublin Core (www.dublincore.org), Semantic Web Knowledge Organization Systems (SKOS) (www.w3.org/2004/02/skos/), RDF collections (www.w3.org/RDF/), and WordNet (www.cogsci.princeton.edu/~wn/). This topic could get some much-needed clarification from the Semantic Web Best Practices Working Group's notes on part of and part-whole relations (http://www.w3.org/2001/sw/BestPractices/OEP/). Lastly, the study of parthood is called "Mereology" (http://en.wikipedia.org/wiki/Mereology).

Class: A class represents a set of things with common characteristics. The characteristics are represented explicitly in a logical model (as opposed to the conceptual model of which a taxonomy is one representation). An example of a class would be a "Person" or "Persons." The issue of plural or singular names for a category is always a subject of lively debate; however, one should remember that both are just labels and thus hold no unambiguous semantics in and of themselves. It is important to be clear that a taxonomic node which is a class must have the ability to contain characteristics (also called facets, attributes, or properties) because it is the existence and examination of those properties that allow "faceted classification" or automated classification by the presence of facets or the value of facets. A "Person" class would have attributes such as name, birth date, eye color, hair color, etc. Classes are the core building block of Unified Modeling Language (UML) diagrams as shown in Figure 1.

Figure 1
Figure 1.

Instance : An instance (also called an Individual in the OWL specification) is a particular occurrence of a class or a specific member of a class. Using our "Person" example, I would be an instance of the Person class, as shown in Figure 2.

Figure 2
Figure 2.

Note that in order to specify an instance, you must specify the class to which it belongs.

Connecting the Dots

Now that we have defined the constructs for a taxonomic node, we must formally define how we connect our nodes. This, by far, is the biggest oversight in current taxonomy development. This is also the surest sign that a taxonomy is informal instead of being formally defined. We will discuss three formal associations between taxonomic nodes: subclassOf, partOf, and instanceOf. Here are definitions and examples of each association:

subclassOf : The subclassOf association is a transitive association only between class nodes that enables us to create class hierarchies. It is an association specifying the specialization of a concept. Such class hierarchies are common in object-oriented programming and knowledge representation and are what most people have in mind when the subject of taxonomies is mentioned. Figure 4 depicts a simple transportation class hierarchy.

Figure 3
Figure 3.

Listing 1 is the OWL representation of the very simple example in Figure 3.


<?xml version="1.0"?>
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:daml="http://www.daml.org/2001/03/daml+oil#"
    xmlns="http://www.owl-ontologies.com/unnamed.owl#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
  xml:base="http://www.owl-ontologies.com/unnamed.owl">
  <owl:Ontology rdf:about=""/>
  <owl:Class rdf:ID="Transportation"/>
  <owl:Class rdf:ID="AirVehicle">
    <rdfs:subClassOf rdf:resource="#Transportation"/>
  </owl:Class>
  <owl:Class rdf:about="#GroundVehicle">
    <rdfs:subClassOf rdf:resource="#Transportation"/>
  </owl:Class>
  <owl:Class rdf:about="#Automobile">
    <rdfs:subClassOf>
      <owl:Class rdf:ID="GroundVehicle"/>
    </rdfs:subClassOf>
  </owl:Class>
  <owl:Class rdf:ID="Truck">
    <rdfs:subClassOf>
      <owl:Class rdf:about="#GroundVehicle"/>
    </rdfs:subClassOf>
  </owl:Class>
  <owl:Class rdf:ID="SportsCar">
    <rdfs:subClassOf>
      <owl:Class rdf:ID="Automobile"/>
    </rdfs:subClassOf>
  </owl:Class>
  <owl:Class rdf:ID="Sedan">
    <rdfs:subClassOf>
      <owl:Class rdf:about="#Automobile"/>
    </rdfs:subClassOf>
  </owl:Class>
</rdf:RDF>

The subclassOf relation is also commonly, in programming circles, referred to as the "IS-A" relation. This nicely matches our intuitive understanding of the relation in that a "sports car is a (n) automobile" and an "automobile is a ground vehicle," etc. The transitive nature of the relation is also clearly evident in the preceding examples because it logically follows that a "sports car is also a ground vehicle." I personally consider transitive relations to be the great "undiscovered country" of information systems.

partOf : The partOf association is another transitive association between collection or classes or instances. It is an association specifying either a weak or strong form of membership. The Unified Modeling Language (UML) 1.0 differentiates between these forms of membership with the "aggregation" and "composition" association (UML 2 no longer supports aggregation). Figure 4 shows a portion of the previous transportation class hierarchy but now using the partOf association instead of subclassOf.

Figure 4
Figure 4.

Listing 2 is the OWL representation of Figure 4.


<?xml version="1.0"?>
<rdf:RDF
    xmlns:dctypes="http://purl.org/dc/dcmitype/"
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns="http://www.owl-ontologies.com/unnamed.owl#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
  xml:base="http://www.owl-ontologies.com/unnamed.owl">
  <owl:Ontology rdf:about="">
    <owl:imports rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/>
  </owl:Ontology>
  <owl:Class rdf:ID="Seat"/>
  <owl:Class rdf:ID="Engine"/>
  <owl:Class rdf:ID="Piston"/>
  <owl:Class rdf:ID="Interior">
    <rdfs:subClassOf>
      <owl:Class rdf:about="http://purl.org/dc/dcmitype/Collection"/>
    </rdfs:subClassOf>
  </owl:Class>
  <owl:Class rdf:ID="Chassis"/>
  <owl:Class rdf:ID="Transmission"/>
  <owl:Class rdf:ID="Automobile"/>
  <owl:Class rdf:ID="Radio"/>
  <owl:Class rdf:ID="Exterior">
    <rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/Collection"/>
  </owl:Class>
  <owl:ObjectProperty rdf:ID="interior">
    <rdfs:range rdf:resource="#Interior"/>
    <rdfs:subPropertyOf>
      <owl:TransitiveProperty rdf:about="http://purl.org/dc/terms/hasPart"/>
    </rdfs:subPropertyOf>
    <rdfs:domain rdf:resource="#Automobile"/>
  </owl:ObjectProperty>
  <owl:ObjectProperty rdf:ID="pistons">
    <rdfs:domain rdf:resource="#Engine"/>
    <rdfs:subPropertyOf>
      <owl:TransitiveProperty rdf:about="http://purl.org/dc/terms/hasPart"/>
    </rdfs:subPropertyOf>
    <rdfs:range rdf:resource="#Piston"/>
  </owl:ObjectProperty>
  <owl:ObjectProperty rdf:ID="chassis">
    <rdfs:domain rdf:resource="#Exterior"/>
    <rdfs:subPropertyOf>
      <owl:TransitiveProperty rdf:about="http://purl.org/dc/terms/hasPart"/>
    </rdfs:subPropertyOf>
    <rdfs:range rdf:resource="#Chassis"/>
  </owl:ObjectProperty>
  <owl:ObjectProperty rdf:ID="seats">
    <rdfs:subPropertyOf>
      <owl:TransitiveProperty rdf:about="http://purl.org/dc/terms/hasPart"/>
    </rdfs:subPropertyOf>
    <rdfs:range rdf:resource="#Seat"/>
    <rdfs:domain rdf:resource="#Interior"/>
  </owl:ObjectProperty>
  <owl:ObjectProperty rdf:ID="exterior">
    <rdfs:domain rdf:resource="#Automobile"/>
    <rdfs:range rdf:resource="#Exterior"/>
    <rdfs:subPropertyOf>
      <owl:TransitiveProperty rdf:about="http://purl.org/dc/terms/hasPart"/>
    </rdfs:subPropertyOf>
  </owl:ObjectProperty>
  <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/isPartOf">
    <owl:inverseOf>
      <owl:TransitiveProperty rdf:about="http://purl.org/dc/terms/hasPart"/>
    </owl:inverseOf>
    <rdfs:range rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#TransitiveProperty"/>
  </owl:ObjectProperty>
  <owl:ObjectProperty rdf:ID="transmission">
    <rdfs:domain rdf:resource="#Engine"/>
    <rdfs:subPropertyOf>
      <owl:TransitiveProperty rdf:about="http://purl.org/dc/terms/hasPart"/>
    </rdfs:subPropertyOf>
    <rdfs:range rdf:resource="#Transmission"/>
  </owl:ObjectProperty>
  <owl:ObjectProperty rdf:ID="radio">
    <rdfs:domain rdf:resource="#Interior"/>
    <rdfs:range rdf:resource="#Radio"/>
    <rdfs:subPropertyOf>
      <owl:TransitiveProperty rdf:about="http://purl.org/dc/terms/hasPart"/>
    </rdfs:subPropertyOf>
  </owl:ObjectProperty>
  <owl:ObjectProperty rdf:ID="engine">
    <rdfs:range rdf:resource="#Engine"/>
    <rdfs:subPropertyOf>
      <owl:TransitiveProperty rdf:about="http://purl.org/dc/terms/hasPart"/>
    </rdfs:subPropertyOf>
    <rdfs:domain rdf:resource="#Exterior"/>
  </owl:ObjectProperty>
  <owl:TransitiveProperty rdf:about="http://purl.org/dc/terms/hasPart">
    <owl:inverseOf rdf:resource="http://purl.org/dc/terms/isPartOf"/>
    <rdfs:range rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/>
  </owl:TransitiveProperty>
</rdf:RDF>

Listing 2 uses two techniques to implement the partOf relations in the diagram. First, we use the Dublin core vocabulary for the Collection definition and the part-whole relations. Secondly, in order to define the parts in OWL, we make the specific "part-of" properties (with specific range constraints) sub-properties of "hasPart." It should be evident that if we created an informal taxonomy that mixed these parts with the subclass hierarchy of Figure 3, we would have a semantically ambiguous artifact.

Lastly, returning to our programming analogies, the inverse of this relation is often referred to as "HAS-A". An automobile has a(n) engine and an engine is part of an automobile.

instanceOf : The instanceOf association is another membership relation but exclusively between instances and classes. The instanceOf association specifies inherent belonging to a class of things because of common characteristics. Such "belonging" goes beyond membership to the things essence of being (in other words, it cannot choose to be an instance of a class, rather, it comes into existence as an instance). In programming languages, creating such an instance is called "instantiation." The instanceOf association is shown in Figure 5.

Figure 5
Figure 5.

Listing 3 is an OWL Representation of Figure 5. It should be noted that in OWL/RDF the instanceOf association is defined via the rdf:type property. In Listing 3, the shorthand notation is used whereby the "Person" element is shorthand notation for the Resource with ID, "MichaelDaconta," of rdf:type Class Person.


<?xml version="1.0"?>
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns="http://www.owl-ontologies.com/unnamed.owl#"
  xml:base="http://www.owl-ontologies.com/unnamed.owl">
  <owl:Ontology rdf:about=""/>
  <owl:Class rdf:ID="Person"/>
  <owl:DatatypeProperty rdf:ID="birthDate">
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#date"/>
    <rdfs:domain rdf:resource="#Person"/>
  </owl:DatatypeProperty>
  <owl:DatatypeProperty rdf:ID="hairColor">
    <rdfs:domain rdf:resource="#Person"/>
    <rdfs:range>
      <owl:DataRange>
        <owl:oneOf rdf:parseType="Resource">
          <rdf:rest rdf:parseType="Resource">
            <rdf:first rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
            >Brown</rdf:first>
<!-- other colors removed for brevity -->
        </owl:oneOf>
      </owl:DataRange>
    </rdfs:range>
  </owl:DatatypeProperty>
  <owl:DatatypeProperty rdf:ID="name">
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
    <rdfs:domain rdf:resource="#Person"/>
  </owl:DatatypeProperty>
  <owl:DatatypeProperty rdf:ID="eyeColor">
    <rdfs:domain rdf:resource="#Person"/>
    <rdfs:range>
      <owl:DataRange>
        <owl:oneOf rdf:parseType="Resource">
          <rdf:first rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
          >Black</rdf:first>
<!-- other colors removed for brevity -->
        </owl:oneOf>
      </owl:DataRange>
    </rdfs:range>
  </owl:DatatypeProperty>
  <Person rdf:ID="MichaelDaconta">
    <hairColor rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
    >Black</hairColor>
    <eyeColor rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
    >Brown</eyeColor>
    <name rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
    >Michael Daconta</name>
    <birthDate rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
    >08/27/1965</birthDate>
  </Person>
</rdf:RDF>

In this article, I've described the characteristics of a formal taxonomy. In future articles, I'll examine more comprehensive examples and applications which apply the benefits of formal taxonomies. Hopefully, it has become apparent that formal taxonomies provide a roadmap to higher forms of semantic expression for the categorization schemes we create. In other words, we are creating taxonomies that are legal subsets of more formal ontologies. Without such discipline, we are dead-ending expensive knowledge acquisition efforts on ambiguous artifacts.



1 to 3 of 3
  1. Topic maps
    2005-01-30 10:54:53 Taylor Cowan
  2. Formal Taxonomies for the U.S. Government
    2005-01-27 21:16:32 Robert Leif
  3. "Is a" - subclasses vs instances
    2005-01-27 17:46:06 Simon Cox
1 to 3 of 3