Uncle Sam's Semantic Web

September 15, 2004

Paul Ford

From the EPA to the Navy, the United States government is coming to see the Semantic Web as a solution to huge data-processing problems. columnist Paul Ford gets the scoop at the 2004 Semantic Technologies for e-Government Conference.

What did the Semantic Web have to do with the war in Iraq? Not enough, says Jim Hendler, who heads the Mindswap Semantic Web research laboratory at the University of Maryland, speaking at the 2004 Semantic Technologies for e-Government Conference in McLean, Virg., held Sept. 8-9, 2004. "The beginning of the Iraqi operation was postponed for weeks because information systems couldn't be made interoperable in the time required," said Hendler. "Systems couldn't talk to one another." It was a problem, he says, that a Semantic Web framework could have solved.

Operation Infinite Triples

Hendler's assessment -- that the Semantic Web is the essential glue that will allow large systems to speak to one another, across organizational boundaries -- was shared by many in attendance. A number of agencies and corporations described Semantic Web projects in progress, for client organizations like the Navy, NIST, the Tennessee Valley Authority, the Office of Child Support Enforcement, the Office of Homeland Security, and others.

However, while many expressed their enthusiasm for the Semantic Web, actual working RDF-driven applications were in short supply, and much of the conference was given over to tutorials and introductions. To many of the attendees, the Semantic Web is clearly still an untested idea.

The W3C is working to make the Semantic Web a more consistent framework, said Eric Miller, the W3C Semantic Web Activity lead. Miller described two W3C working groups, each focused on clarifying what the Semantic Web is, and how it can be used. One, DAWG, the RDF Data Access Working Group, is on target to deliver a standard RDF query language, akin to SQL for relational databases, by January 2005. The second, the Semantic Web Best Practices and Deployment Working Group, was chartered, according to Miller, to "reduce the costs associated with implementation and sharing RDF data."

Ontologies Under Attack

While many are just learning about the Semantic Web, others are beginning to question it. Steven Ray, chief of Manufacturing Systems Integration Division at NIST, questioned whether OWL went far enough. Ray believes that Semantic Web toolkits should allow more expressive ontology development, and calls for more expressive formal logics (with KIF offered as an example) to be brought into the ontology development languages.

"Even advanced technology like OWL doesn't go as far as you could in terms of nailing down definitions," Ray said. Ray focused his talk on PSL, the Process Exchange Language intended to describe and exchange manufacturing processes. PSL is based on KIF.

The PSL Ontology is a complex standard focused on broad concepts like "occurrence tree automorphisms" and "envelopes and umbrae." While this broad focus is proving itself useful, according to Ray, James Hendler questioned the need for high-level ontologies, and wondered if a Semantic Web that seeks first to be an accurate map of reality is worth pursuing.

"Epistemology grows bottom up," he said, pointing out that while it is incredibly difficult to create a consistent model of time that works in all situations, it is possible to create a model that works in most circumstances.

The question of "how much reality do I need to model?" has plagued the Semantic Web for years, and if this conference is any indicator, it will continue to do so going forward. The Semantic Web, as a whole, can be seen in two ways: one, as an attempt to model reality on a computer, within the tradition of artificial-intelligence research that has taken place over the last 50 years; and two, as a means whereby data can be reliably created and exchanged with clearly defined semantics.

In truth, both of these concepts of the Semantic Web are compatible. Work will continue on Standard Upper Ontologies; at the same time, a larger number of organizations are using RDF for more mundane data exchange and storage (with RSS1.0 as one example; Mozilla Firefox also uses RDF internally as a data storage model.)

Ideally, if the upper ontology work bears fruit, and as a result computers can do a better job of understanding natural language, drawing conclusions, and scheduling tasks, the huge stores of RDF that are created in the meantime -- without much attention to upper ontologies -- can be analyzed and understood using the concepts defined in the upper ontology.

Billions and Billions

As more triples are created, checkbooks are coming out. TopQuadrant, one of the co-sponsors of the conference, has decided that the Semantic Web is a bankable technology, and predicts that a U.S. $63 billion market for "Semantic Technology" will emerge by 2010.

Miller, the W3C Semantic Web Activity lead, is not as confident in these numbers -- not from a lack of confidence in the technology, but from a suspicion of all predictions. "I am cautiously optimistic. I'm seeing an increase in tools and services," said Miller.

Miller would like to see the Semantic Web avoid the hype cycle that accompanies hot new technologies. "Companies are making money [using Semantic Web technologies], and seeing great returns on investment," said Miller, "but the Semantic Web is not a magic bullet. It is an enabling technology that solves certain problems in a very effective way."

Liberty and Justice for All?

Most of the applications under discussion did (or will do) specific, concrete things: they enabled industry and government to effectively share information; they tracked deadbeat dads; they looked through piles of data for terrorists. In contrast, the topic of privacy was rarely broached, and no projects were focused on putting data into the hands of citizens.

Speaking after a presentation by the Navy, one attendee, Denise Bedford, of the World Bank, questioned whether the Semantic Web is the right solution for the problem of identifying terrorists. "We'll never get there [to capturing terrorists] with just statistics," she said.

It's a valid concern: clearly, a number of government agencies feel that large triple stores and data mining will uncover terrorist activity and help keep America safe; hard evidence regarding this proposition, on the other hand, is hard to find. (Readers with examples of data mining/expert systems being used to capture terrorists and stop other crimes should leave a comment below this article.) Indeed, as Miller said, the Semantic Web is not a magic bullet, and as the technologies that make up the Semantic Web begin to leave the research laboratory and join the real world, the true shape and value will become more apparent. Is the Semantic Web the ultimate tool for scheduling a dentist's appointment, or an advanced, reliable way for manufacturers to exchange data? The next-generation terrorist-catcher? None of the above, or all of the above?

With DAWG close to release, and reliable, open-sourced triple stores like Kowari available to all, the next 12 months should provide us some answers to those questions. In the conference's closing ceremonies, Brand Niemann, the cochair of the conference, challenged the presenters "to come back next year and show us their progress." While the conference was an enthusiastic testament for interagency data sharing -- which is a laudable focus -- it will be interesting in 2005 to see what plans these agencies have to open their data to citizens.