Ontology Tools Survey, Revisited
July 14, 2004
A new survey of ontology editors was conducted as a follow-up to an initial survey conducted in 2002. The results of the survey are summarized in this article. The results of the original survey may be found at www.xml.com/pub/a/2002/11/06/ontologies.html.
Ontologies are a way of specifying the structure of domain knowledge in a formal logic designed for machine processing. The effect on information technology (IT) is to shift the burden of capturing the meaning of data content from the procedural operations of algorithms and rules to the representation of the data itself.
Opening the International Semantic Web Conference in 2003, the conference chair Jim Hendler declared that "a little semantics goes a long way." The belief being that infusing even a little semantic quality into our data (residing in web pages, database tables, electronic documents, or whatever) can mean that data is more immediately, broadly, and profoundly usable by all applications aware of the knowledge-representation scheme -- the ontology.
For such reasons, there is a growing sense among researchers and practitioners that ontologies will play an important role in forthcoming information-management solutions. Several conditions predicate this current state of affairs.
State of Ontologies
Practical ontology languages are being adopted. For example, the W3C recently recommended the OWL language and RDF for building web ontologies. These language specifications were developed over several years both within and outside of the organization, and OWL is rapidly replacing its predecessor DAML+OIL with the blessing of the DAML Office in the Department of Defense, which funded much of its early development. Commensurate W3C standardization activities are now underway to expand the development framework for building and using web ontologies with web services, deductive rules, and optimized query languages.
Numerous commercial and open-source software tools are available for building and deploying ontologies, and for integrating inference systems with web and database infrastructures. Increasingly, these tools directly support the emerging web ontology standards, as well as related, standard-language efforts like Simple Common Logic (SCL as an offshoot of KIF) and ISO EXPRESS.
Reference to taxonomies and ontologies by vendors of mainstream enterprise-application-integration (EAI) solutions are becoming commonplace. Popularly tagged as semantic integration, vendors like Verity, Modulant, Unicorn, Semagix, and many more are offering platforms to interchange information among mutually heterogeneous resources including legacy databases, semi-structured repositories, industry-standard directories and vocabularies like ebXML, and streams of unstructured content as text and media. Ontologies, for example, are being used to guide the extraction of semantic content from collections of plain-text documents describing medical research, consumer products, and business topics.
Government initiatives to strengthen information technology capabilities of federal agencies and services are integrating the use of ontologies with existing infrastructures to perform incisive and far-reaching assessments of information flowing from disparate sources. Anti-terrorism intelligence analysis and command-level, combat-decision support are typical examples.
Major web search services like Google and Yahoo are using ontology-based approaches to find and organize content on the Web. Google's acquisition of Applied Semantics, Inc. -- one of the leading vendors of semantic extraction tools -- portends an active role for ontologies in their technology solutions.
In April of this year, Gartner, the market research firm, identified taxonomies/ontologies as one of the leading IT technologies, ranking it third in its list of the top 10 technologies forecast for 2005.
Also, ontologies are being used by business and government to help define and implement enterprise-level architecture frameworks that can enable the coherent interplay of information systems within an enterprise environment. Approaches like the Federal Enterprise Architecture (FEA) and OMG's Model Driven Architecture, for example, may benefit from ontology-mediated specifications.
Building an Ontology
You don't author an ontology as much as you construct it. Ontology building is not a very linear process, and you may approach the task from several perspectives at once, both top-down and bottom-up. It is also a substantially iterative process. Skeleton structures of core concepts are extended with more refined and more peripheral concepts, and these are more tightly interwoven with additional elaborating relations. While parts of this may sound like conventional software development, there are fundamental differences.
Procedural and object-oriented software, regardless of whether it is being coded imperatively or declaratively, uses structural aspects of the software to control program flow and use. Ontology languages primarily use structure to specify semantics. For example, while subclass inheritance in object-oriented languages is a mechanism of convenience that enables code reuse, subclass inheritance in an ontology language enables semantic interpretation of the data through classification, entailment, and restriction.
An ontology building process may span problem specification, domain knowledge acquisition and analysis, conceptual design and commitment to community ontologies, iterative construction and testing, publishing the ontology as a terminology, and possibly populating a conforming knowledge base with ontology individuals. While the process may be strictly a manual exercise, there are tools available that can automate portions of it.
For example, linguistic tools can analyze the content of domain documents in order to synthesize ontology terms themselves, or to extract content corresponding to a domain ontology as individuals forming a knowledge base. Building complex ontologies today usually relies on the manual composition of the ontology using an ontology editor for the chosen ontology languages(s).
The intent of this article is to summarize the manual editing tools currently available to practitioners interested in building structured ontologies suitable for information management and other applications. These tools may also have capabilities for automatically extracting information from domain documents. The article follows an earlier article (see Resources) summarizing some 56 ontology editors. That article also provides a useful introduction to building ontologies. Results from a new survey of ontology software providers were used to replace the original tool descriptions and add descriptions of 40 additional ontology editors. The descriptions identify tool characteristics in 13 categories as distinguished in Table 1.
The survey covers tools with ontology editing capabilities that can be used to build ontology schemas (terminologies) and/or instance data. These ontology editors may be available as standalone, plugin or online software, and need not be production level software with complete functionality and user support.
The survey results are presented in Table 1 as categorical descriptions of 94 ontology editors currently available to the ontology building community. The results include contact addresses for obtaining additional software information.
Room for Improvement
As part of the survey, each respondent was asked to answer the following question about what enhancement they would like to see in future ontology editors:
"What advancement in existing tools do you believe is needed most to improve our ability to build useful ontologies?"
Fifty-six percent of the respondents provided answers to this survey question. The results are summarized in Table 2 where individual answers are categorized by sorting them into 11 different areas of tool enhancement. The percentages appearing in the table indicate the proportion of respondents whose answer was categorized as relating to the indicated feature area.
Table 2. Top Tool Features to Enhance Ontology Editing
|Abstraction for knowledge modeling||18%|
|Visual/intuitive navigation of ontology||13%|
|Reasoning and problem solving facilities||12%|
|Ontology alignment and data resource integration||12%|
|Support of standard industry domain and core vocabularies||9%|
|Natural language processing||7%|
|Ontology language standardization||6%|
|Built-ins (wizards) for best practice methods||6%|
|Information extraction facilities||4%|
|Features to learn user's editing style and needs||3%|
|Collaborative development support||1%|
|Ontology support for contexts||1%|
The other top answers include: the use of reasoning facilities to help explore, compose and check ontologies; and the inclusion of facilities to help align ontologies with one another and integrate them with other data resources like enterprise databases. The remaining answers addressed enhanced support for industry domain standardization, natural language processing, collaborative development, and other enhancements mentioned by less than ten percent of respondents.
Collectively, the sentiment expressed by respondents centers on tool features to make building full-blown ontologies easier and more foolproof, especially for domain experts rather than ontologists. This sentiment echoes back a few decades to when practitioners were trying to use expert system shells productively. On the other hand, new tool features to help align domain and core ontologies including standard vocabularies are emerging as a more contemporary focus, more in concert with enterprise application integration and development trends.
One ontology building trend not articulated in the survey responses, but highlighted in a dedicated session of the recent WWW 2004 Conference, is support for ontology languages built on RDF and the use of URIs as identifiers for referring to unique entities. Ontologies for the Semantic Web are characterized as RDF ontologies, and are being built using OWL and other languages based on RDF. Current attention to the Semantic Web and the language standardization it offers has resulted in the single most prominent change in ontology editors since the original survey in 2002. This growth in direct support for RDF and various species of OWL has created some controversy.
The issue arises in consideration of whether RDF is the best base language for implementing ontologies on the Web or elsewhere, and whether it affords the scalability necessary to implement very large ontologies and webs of ontologies, and whether it affords the representational power or expressiveness to build ontologies of the sophistication necessary for demanding applications.
Other ontology languages such as SCL, CycL, and LOOM, for example, arguably offer more power of expression and reasoning, but lack intimate support of RDF. The advantage offered by RDF that remains compelling for ontologies seems to be the universal use of URI and XML namespace protocols on the Web. This unifying aspect, for instance, may make it easier to establish, through collaboration and consensus, the utilitarian vocabularies (as ontologies) needed for far-flung cooperative and integrative applications using the Web.
Wearing the mantle of W3C standardization, OWL enjoys much more attention today than any other ontology language -- in or out of the Web world. Its detractors tend to single out its limits of expression, its inelegant syntax and, of course, its reliance on the RDF model of representation using triples. Some basic language constructs like lists and other collections are deemed cumbersome and in need of extension in new language implementations. These shortcomings, if one chooses to see them as such, clearly add more to the ontology toolmaker's plate. The successful ontology editor may be expected to mask these kind of idiosyncrasies with higher level functionalities.
Traditionally, an integrated development environment (IDE) for software is language specific and exploits underlying details and native capabilities of the language whether it is a programming language like Java or C++, or a design notational language like UML, or both. Such a consistent focus has not yet emerged in a suite of tools for building ontologies. Indeed, when this does happen it may be as part of a general enterprise level IDE. Conversion from UML to OWL and from OWL to Java is already under development in some tools.
As a precursor to such IDEs, ontology toolmakers are now paying attention to creating a coherent view of the ontology as the software is built. Results of the present survey speak to the advantage of a tool that promotes and maintains the user's apprehension of the ontology. Navigation features for easily drilling down and zooming out in an ontology structure, as well as jumping to semantically related elements in the structure, are important cognitive aids to productivity. Lexical support that helps find and organize terms within and across ontologies similarly buoys the editing task. Other tool features that may contribute to the builder's craftsmanship include:
- Simplified entry of higher level ontology constructs and principled patterns using generic and domain-specific templates/widgets and wizards.
- Diagrammatic presentation and manipulation of the ontology and its axioms.
- Automatic classification and integrity checking of ontology (via an inference engine) as specification statements are entered.
- Loading, viewing and editing multiple ontologies concurrently.
- Ability to modularize and arbitrarily partition parts of the ontology for building, merging and testing.
- Refactoring the ontology via name changes, subclass migration, property type migration, concept-individual migration, tree pruning, etc.
- Automatic and author-specific annotation of the ontology during development using
metadata to record software evolution and provenance including rationales and proofs,
revisions, comments, origins, natural language references, etc.
- For example, tracking the user's ordering of ontology terms and axioms, and allowing definable resorting of terms.
- Full support of the language's capabilities to import external ontologies.
- Software import/export capabilities in abstract syntax or canonical serializations to ensure reliable and complete round-trip development while using other ontology tools.
Achieving most of the ontology editing functionality suggested above is a ways off. For example, simply achieving reasonably complete support of the OWL language in an ontology editor is proving to be no mean feat for the OWL Plugin for Stanford's Protégé editor. As evidenced in the public view of its course of development, this yearlong endeavor has provided a very capable tool, but one that is still very much evolving in terms of both capabilities and the user interface.
While achieving full-range ontology editing functionality is a tall order for toolmakers, the capabilities called out above are not the only demands toolmakers face. The world of ontology application itself is changing in a way that is putting more pressure on ontology language implementations and editing tools to handle new tasks. Some see the gathering demands as an impending crisis for providing editing environments that can accommodate an expanding scope of ontology language responsibilities. Eventually, editors will have to address the ontology language and reasoner functions currently under development, including:
- Ontology rule languages like the Semantic Web Rule Language (SWRL), which combine RuleML with OWL.
- Probabilistic extensions for OWL like those being pursued at The University of Maryland Baltimore County (UMBC).
- Defeasible or alternative logics to support forms of non-monotonic reasoning in closed-world and open-world ontologies that are being investigated.
- Complete reasoning capabilities of OWL Full and RDF, which are being investigated.
- Enforcing formal ontology principles on the design and implementation of ontologies as imposed by development environments like IODE and others.
- Semantic Web services ontology languages like OWL-S.
- Behavior-modeling capabilities of ontology languages like those present in The Discovery Machine and OPCAT.
- Facility for associating an ontology or parts of an ontology with specific problem-solving methods (PSMs).
- Means for ontology processes to request and access web-borne knowledge, irrespective of how that knowledge is organized, such as the URIQA scheme.
- Effective integration of RDF ontologies and XML Schema for full mapping between them using path-query mechanisms analogous to XPath.
- Generation of ontologies specifically suited for use by agents in a multiple-agent system implementation as in the DISCIPLE tool.
- Automatic updating of domain ontologies through built-in processes to acquire and analyze source domain information and identify modifications to the ontology as new or modified concepts and relations.
Choosing your Editor
When comparing the ontology editors described in Table 1, it becomes clear that the tools offer a wide and varied range of capabilities. In the absence of an IDE for ontologies, tried and true or otherwise, the practical approach today is to rely on several ontology building tools to fashion different aspects of an ontology and manage the development process.
Recognizing that fashioning an effective representation of a problem can be a great part of its solution, one objective in choosing an ontology editor is to maximize the match between its potential output (as ontology content and structure) and the character and dynamics of the particular domain problem space that your ontology is intended to address. Thus, tools supporting specific industry vocabularies, linguistic capabilities, modeling styles, or ontology constructs (like instances and datatypes) may be better suited to the demands of particular problem applications. When assembling your own bench of tools to gain the convenience of a coordinated development practice, there are a number of factors to consider. Some of these were enumerated in the foregoing discussion on tool usability. A few more are:
- A common ontology specification interchange language is necessary to ensure that expressivity is not lost and consistency is not compromised when moving between tools. That is, you should require round-trip editing among all your ontology tools.
- Related to the preceding consideration is the rapid growth in widespread adoption of the OWL language. When editors do not natively support OWL import and export, specific translator tools should be identified to seamlessly bridge between the editor's native language(s) and OWL.
- As observed in the original survey, tools may differ markedly in their level of use and maturity. Some tools have very active development and user communities that increase the likelihood that the tool will continue to be available and kept up to date.
- Sometimes independent of a tool's level of community participation, the level of technical support and training available from the software provider is important to forming a productive user team.
- Editors with a software architecture that allows easy extension with addition of functionality and integration with other tools is advantageous. The use of common application frameworks, plug-in facilities, well-implemented and documented APIs, and the like may compensate for the lack of a true IDE.
- Product factors like licensing terms, purchase price, documentation, update policy and upgrade path are also productivity issues.
Regardless of your choices, the ontology-building experience will be challenging, but one that holds the promise of solving real problems when key questions hinge on the operational semantics of the domain.
International Web Ontology Programs
Recent Conferences Covering Semantic Web Tools
Guidance on Building Web Ontologies