WWW2004 Semantic Web Roundup

May 26, 2004

According to Tim Berners-Lee's WWW2004 keynote address, the Semantic Web is entering "phase II", a time of "less constraint" when Semantic Web developers are encouraged to build upon the foundations of RDF and OWL to create working applications on both the server and the desktop. And while other topics were discussed at WWW2004, such as mixed markup and XForms, this was definitely the Semantic Web's moment in the sun, with academic and corporate presentations alike focusing on the uses of RDF, triple stores, and data sharing.

The Semantic Web focus was not without its critics. Elliotte Rusty Harold posted the following to his site after listening to one of the many Semantic Web-related presentations at the conference:

I feel like I'm a mechanical engineer in 1904 listening to a bunch of other engineers talks about airplanes, but nobody's willing to show me how they actually expect to get their flying machines into the air. Maybe they can do it, but I won't believe it until I see a plane in the air, and even then I really want to take the machine apart before I believe it isn't a disguised hot air balloon. A lot of what I'm hearing this morning sounds like it could float a few balloons.

Both Berners-Lee and Harold are asking the same question from different vantages: where are the applications? There is a framework, not yet fully proven, for a massively distributed, world-wide database, glued together by ontologies -- and now what?

If the answer to "what can I do with the WWW?" was Mosaic 1.0, the question "what can I do with the Semantic Web" has no corresponding killer app. Indeed, Berners-Lee asked the assembled group to forget about killer apps totally; as reported last week, he said that the proof of the Semantic Web is when new connections are made, and new links between information emerge.

That said, there is a great deal of work going on within corporations and academic research groups, each of them trying to answer the question in its own way. Some are crafting better back-end storage and querying methods, others attempting to give the end-user a better experience. Throughout WWW2004's Semantic Web track, managed by Eric Miller, the W3C's Semantic Web Activity Lead, the conversation shifted from theory to practice as betas and demonstrations of working products were shown.

The Server-Side Semantic Web

At the bottom of the Semantic Web there must be a means to store RDF, and for some time the leading storage framework has been Jena, from HP Labs. Many of the Semantic Web projects discussed during the conference used Jena as a backing store, but one contender to Jena's throne is Kowari, from Tucana Technologies.

Written in Java 1.4 to take advantage of native I/O support, Kowari was created from the ground up as a database for triples. This contrasts with Jena's reliance on back-end database engines (i.e. MySQL) for persistence. Kowari is available in three flavors: as a component of the Tucana Knowledge Server, an enterprise product focused on metadata analysis and knowledge discovery; as an open-sourced (Mozilla Public Licence 1.1) server with a long list of features, including SOAP bindings and "Descriptors" for transforming data using XSLT; and a "Lite" version derived from the full version, which jettisons some features to allow for a smaller (11 Meg) download size.

Rather than competing directly with Jena, Kowari includes support for the Jena API, as well as JRDF, an alternative RDF-management API, and adds to these APIs with a new SQL-like query language, iTQL, that can be used via a built-in interactive shell.

Moving up from storage, the SIMILE (Semantic Interoperability of Metadata and Information in unLike Environments) project, jointly developed by the W3C, HP, and MIT, is focused on collecting and publishing Semantic Web data to the (non-Semantic) Web. SIMILE has two major components, both open-sourced: Longwell and Knowle, which work together to provide a user-friendly Web-based front-end to RDF. The SIMILE framework has been deployed in several projects and can be seen in a publicly-available demo of the tools in operation, which allows users to traverse and compare W3C Technical Reports.

A more general-purpose framework for building Semantic Web applications, KAON (The KArlsruhe ONtology), is a tool suite and application server for developers seeking to build any sort of ontology-driven application. To that end, it provides a front-end to ontology development, called OI-Modeler, along with a number of API interfaces for querying ontologies and managing RDF, and code for generating a Web-based portal for exploring data managed by KAON.

The Client-Side Semantic Web

Sometimes the simplest software can do the most, and this was the case with Ralph Swick's talk on using the Zakim IRC bot as a support system for teleconferences. The bot, which is integrated with a teleconferencing system, serves as a kind of automated secretary and group notepad for conference calls, informing users of who is on the phone, generating meeting minutes, and recording action items.

A more complex, but equally promising application for the Semantic Web is Bibster, a peer-to-peer framework for managing and sharing bibliographic data. Bibster allows users to seek out book and article references from an arbitrary number of P2P hosts, and also seeks to consolidate messy data to avoid redundant listings. In addition to allowing literature search on a number of fields (i.e. title, abstract, author), it can search for references using the ACM's topic hierarchy (a taxonomy of topics specific to computer science) and other taxonomies, and allows users to browse through that hierarchy, then search for references which cover a selected topic.

Offering a different view of the Semantic Web was SWOOP (Semantic Web Ontology Overview and Perusal), from the University of Maryland Mindswap Lab. SWOOP is an ontology browser that uses a Web-browser metaphor to allow users to download OWL ontologies and edit them. While it is certainly not a tool that will bring the Semantic Web to everyone, its interface offers a marked improvement over older, more complex ontology management tools like Protégé.

Perhaps the most impressive client-side application built on a Semantic Web framework is Haystack, a "universal information client" that seeks to link together different kinds of user data (emails, addresses, Web bookmarks, etc.) with a consistent interface. Emerging from the Eclipse framework, Haystack uses the concept of the "collection" as an organizing principle. A collection might be a list of bookmarks or a set of email messages, which can then be displayed in different "views" or through different "lenses." Haystack is, ultimately, a tool for managing collections of these collections, all interlinked.

Like Eclipse, Haystack appears to be almost infinitely extensible, and one presentation by Dennis Quan of IBM showed it being applied to problems in bioinformatics, using LSID URNs instead of URIs, creating a unified view of multiple bioinformatics databases.

While Haystack shows much promise, it is also a large and slow application, written in Java -- over 40 megs to download, with 512 megabytes of memory required for use. Caveat downloader.

Java: the Semantic Web Language of Choice

Many of the Semantic Web applications demonstrated during the conference are written in Java, as is much of the publicly available code for working with the Semantic Web. While this may alienate some developers, it also demonstrates a commitment on the part of the presenters to create re-usable code, and this approach has paid off in tools like Kowari, which simply grafts the Jena API on top of its triple store, allowing existing Jena users to migrate with a minimum of pain. It may also indicate the desire of many developers to see the Semantic Web take root in the enterprise, where Java is an acceptable development tool.

One Java tool which was repeatedly mentioned at WWW2004 was Lucene, an API for full-text search. A conference presentation by Doug Cutting, who founded the Lucene project, described how Lucene has found its way into dozens of projects, including Nutch, an open-source search engine that hopes to one day compete with Google. At first, Lucene's relationship to the Semantic Web may seem unclear -- after all, the Semantic Web is about resource discovery by analyzing triples, not full-text search. However, along with URIs, literal values make up a good portion of RDF, and Lucene offers an easily embeddable means to provide for search within those literal values. Most notably, Lucene is integrated into Kowari, where it allows for combinations of graph-based querying and old-fashioned keyword lookup.

While Java's success on the server is difficult to dispute, on the desktop the lead is not as clear. A tool like Haystack, while elegant in screenshots, appears to be staggering under its own weight when it's run on even a fairly powerful laptop (one attendee called it a "Shrek" -- sweet, but a monster). While SWOOP is a lighter-weight application, as is Bibster, both have simple GUIs, and don't provide the eye candy or visualization options that feature in Haystack and that end-users have come to expect. For the Semantic Web to succeed on the desktop, it may need to leave Java behind; one promising approach might be to focus energies on .NET/Mono implementations; alternately, developers could consider using Mozilla's XUL, particularly given the fact that Mozilla already stores application data in RDF -- "triples all the way down."

Summing Up the Semantic Web

Returning to Elliotte Rusty Harold's quote regarding engineers and airplanes, while the Semantic Web applications shown at WWW2004 are not equivalent to large commercial jetliners, several applications seem to be self-propelled, running on more than hot air. But it is also clear that many are still waiting for a "conversion experience" regarding the Semantic Web.

At WWW2004, it seemed as if a gauntlet was thrown down, both by Semantic Web boosters like Berners-Lee and critics like Harold. Both are waiting for applications to emerge, for working code. Given the attention paid to the Semantic Web at the conference, and given that the W3C has invested a large portion of its influence and resources to promote the Semantic Web to the Web community at large, it is clear that the RDF/OWL framework must continue to gain momentum and find its way into the hearts and minds of developers before long, so that it can avoid the fate of other well-considered and useful W3C specifications -- whither art thou, XInclude, XLink, and XPointer?.

However, it does seem as if the Semantic Web has left its childhood and entered its adolescence, venturing out from the sheltering roof of the W3C and showing up in such dangerous places as Bristol, Maryland, and Brisbane. As with any adolescent, it is difficult to know exactly what sort of adult it will become: a set of interlinked desktop tools? A component on the server side? A tool for scientists, or one for publishing TV schedules? While the destiny of the Semantic Web is impossible to predict, one thing was made clear at WWW2004: the next 12 months will be the Semantic Web's chance to stand up and prove itself, if it is going to do so.