Menu

Topic Maps

June 16, 2000

Liora Alschuler

The Phenomena

Everyone knows that scalable location and navigation mechanisms for the Web and other hypertexts is a Big Problem. Still, you would expect a solution coming from the International Organization for Standards (ISO) to be a hard sell to GenXML'ers, bred on the notion that the W3C liberated syntax from Standard Generalized Markup Language, ISO 8879:1986. If there is resistance to Topic Maps, ISO 13250:2000, based on their parentage, you wouldn't know that from the crowds attending the two tutorials, 14 presentations, and three startup technology booths at GCA's XML Europe 2000.

What follows is a quick look at the spec, the three vendors with topic map code, some early implementations, some early impressions of the issues topic maps will face in the next months, and a short soapbox assessment of what this all means for the Web.

The Spec

Topic Maps describe knowledge structures and associations between structures and resources. Structures and associations, and association types can all be "topics" that are mapped to each other and to resources, which are real-world media objects. Since this is all recursive, it is easiest to describe TMs (topic maps) with an example. In the domain of opera, using an example laid out by Steve Pepper in the his paper The TAO of Topic Maps, "Tosca" is a topic, "is written by" is an association, and occurrences are the instantiations of this link in identified media. If it sounds like RDF, well, it is, but it's not (more on this below).

Topic maps separate semantic connectivity from content in a manner analogous to the way in which XML separates presentation from content. Like style sheets, topic maps can be layered or applied as alternative navigation systems.

Technology Vendors

TM presenters and authors at XML Europe 2000 listed affiliations with nine technology companies and academic institutions, three vendors were showing TM products and services, and there was palpable, intense interest from developers and implementers.

InfoLoom

TMLoom and the Topic Map Loom XSL subset were developed by Michel Biezunski, co-editor of ISO 13250 and of the XML Topic Map (XTM) effort sponsored by GCA's IDEAlliance. Biezunski has created a startup in partnership with Dianne Kennedy, consultant and Chief Technical Consultant for IDEAlliance. The company is InfoLoom and the technology has been in development alongside the spec for four years. It is middleware code that takes source data of any format plus SGML or XML Topic Maps and outputs new source data with semantic links in place. The user creates a TM and applies that to the resource, then the middleware creates browsable or printable output that "can be used to dynamically build and maintain browsable, navigable, hyperlinked knowledge bases." according to product literature. Search and recognition engines can be plugged into the middleware to create the TM.

The technology will be provided as services offered through integrators. Biezunski and Kennedy have developed a program of workshops and tutorials, which they bundle as a "Certified Partners Program," the objective being to enlist qualified implementers who will interact with the market. From this perspective, this appears to be a difficult strategy when two week's worth of consulting and use of the software are priced at $24,000, but if sufficient pent-up demand is there, integrators could get a return on the investment.

The software package is available bundled for integrators and developers in July and the certified partners programs is launched in July.

Ontopia

Ontopia is a six-employee company so new, they just got their domain name and their T-shirts practically had wet ink. Most of the initial staff came from the Norwegian office of STEP Infotek, which is also developing TM technology. Unlike InfoLoom, both Ontopia and STEP are building TM engines. Steve Pepper, CTO and acting CEO of Ontopia, says they are key enabling components of TM implementation. Ontopia will soon have two engines available: Atlas, which is in Java and will be available under conventional license, and tmproc, which is an open source Python engine available for download as soon as they get the site up.

Product literature describes Atlas as a "publishing and navigational framework" for fast development of custom web applications. It can be used in conjunction with different persistence mechanisms and different interchange syntax (practically speaking, that appears to mean XML, HTML and SGML.) It has a general API, which potentially can be linked to a natural language processor or pattern matching code to generate new topic maps, but they haven't tried this out yet.

STEP UK

STEP UK is the lone non-startup in the group. It is one of the largest SGML/XML consultancies and technology vendors in Europe, with offices in five countries, plus the US. STEP is part of the Mohn Media Bertelsmann Group. According to Graham Moore, CTO, they have dedicated major resources to TM technology development and will release a Java engine, now in beta, and a set of classes and interfaces. Development is centered in STEP UK by Moore, who is the author of their XLink engine, X2X.

Implementations

InfoLoom has a reference implementation at Quid. In addition, Kennedy reports that they are prototyping a three-title topic map for a major publisher of technical books. A TMLoom-based implementation for Frank Russell is written up in the XML Handbook. The Quid site has more than 200,000 hyperlinks.

Ontopia reports interest from commercial publishers and technical documentation publishers in both the US and Europe. While Pepper confirms that the entire topic map development was done in Europe, interest is at least as high on the other side of the Atlantic.

STEP UK reports two sales of their XLink engine, X2X, but the topic map development has only been recently announced, and no implementations are yet up and running. They see three types of market emerging: portals, enterprise applications and personal users, and have designed their product strategy to fit these three areas.

See the titles of XML Europe 2000 topic map track papers for references to further work in industry and academia.

To get a taste of what is in the pipeline for topic maps, I spoke with Suriya Narayanan, Senior VP and CTO of XCare.net, a newco with 120 employees and an eye on the healthcare ASP portal market. Narayanan says unabashedly that topic maps are the next wave of computing, comparable to the advent of the GUI. His illustration of TM-centric development comes from his own shop, where they are using the spec to define dynamically customizable workflow.

Narayanan says that a topic is not necessarily a noun, it can be a verb like "initiate patient visit". This could be the lynchpin for a semantic network tying together billing and reimbursement, ordering of further services, such as laboratory tests, and the documentation that describes these services. In the XCare.net model, all interactions are modeled as topic maps and workflow is a rendered topic map assembled dynamically at runtime.

Issues

Impressive and ambitious as these early implementations appear, there are issues that need to be resolved before Topic Maps become the next new thing.

The relationship with W3C XML standards (XML, XLink, RDF)

Topic maps are specified in SGML with HyTime links, but can be implemented (and are implemented) in XML with XLinks. The constraints of XML and XLink, such as no way to express links between links, however, have introduced room for interpretation--currently four XML implementations (the three commercial ones described here and one done privately by Kal Ahmed of Chrystal on www.teequila.com) use three XML DTDs. Thus, the topic maps created for each implementation are not interchangeable. This is not as bad as it sounds, because while written under ISO with SGML syntax, there was a constant eye on XLink. The issues to be resolved for a reference XML topic map implementation are well understood, and the subject of a fast-track reconciliation effort called "XTM" (XML Topic Maps), under the auspices of GCA's IDEAlliance and topicmaps.org.

The relationship to the RDF is both clear and fuzzy. Kennedy sees some potential conflict with RDF. Pepper sees similarities and differences: similarities in problem space and use of a model and implementation syntax; while differences lie in the fuzzy (to this reporter) distinction between resource-centric and knowledge-centric metadata. Moore may have the last word on this--still in grad school, he has a gut feeling that you can implement topic maps in RDF ,and plans a detailed analysis of the relationship between the implicit RDF metamodel and the implicit TM metamodel as the topic for his Ph.D. thesis.

Ontology, Authoring, Interface Design, Interoperability

Like XML schema design, TM presents an opportunity for garbage in, garbage out. The topic map ontology must be well designed, or the connections will be ISO-standard spaghetti. Beyond that, there are three potential areas of constraint on development: authoring tools, interface scalability, and interoperability. Creating semantic markup from narrative is not a trivial task, whether the target app is HREF, RDF, or TM. Topic maps can leverage existing markup and make it reusable across contexts in interesting and dynamic ways, but some algorithm or human hand has to make the initial connection between some point on the map and the target resource. Small hypertexts are notoriously sexy, but the inability to scale up killed HyperCard. The TM engines and environments must support scalable interfaces if they want to index the Web. As Graham Moore says, "once XTM has cleared up the interoperability issue for XML implementations, topic maps will be a much easier sell."

Bottom Line

It is not reasonable to assume, a priori, that the entire edifice of an open, semantic hypertext will be created under one host organization. Standards writing is, or should be, a creative activity--and creativity is just not always amenable to planned development. One advantage of well-designed, open standards, such as those coming from the W3C, is that they need not keep the whole playing field to themselves.

Widespread implementation of non-proprietary semantic navigation networks is good for the Web. ISO 13250:2000 topic maps should be given full consideration by developers, content providers, builders of portals, and supporters of open standards and an open Web.