The Semantic Web: A Primer
by Edd Dumbill | Pages: 1, 2
Semantic Web Technologies (con'td)
The W3C's Resource Description Framework is one of the cornerstones of Semantic Web work. While its somewhat unwieldy syntax often attracts negative attention from XML developers, the real value of RDF is the data model. It defines a very simple data model of triples (subject, predicate, object), where subject and predicate are URIs, and the object is either a URI or a literal. With this simple model, objects and their properties may be represented. Although the XML serialization of RDF (the "Syntax" of the RDF Model & Syntax specification) is referred to as RDF/XML, other syntaxes are being proposed to try and overcome the awkwardness of the existing syntax. For example, RDF models could just as easily be serialized using SOAP's serialization rules (see presentation at WWW9 by Henrik Frystyk-Nielsen). It is in this simple data model where the power of RDF truly lies. As long as information on the web can be reduced to triples like this, it doesn't really matter which XML serialization format is used. What isn't negotiable here though is the role of the URI as a universal identifier.
The table below shows an hypothetical RDF/XML snippet, and the generated triples in the data model.
<contact rdf:about="edumbill"> <name>Edd Dumbill</name> <role>Managing Editor</role> <organization>XML.com</organization> </contact>
Once we have the data model, there's a need to describe the characteristics of the objects being modeled. For instance, we want to say that a "Contact" must have a name, role, and organization property. This is where RDF schemas come in -- they define an RDF vocabulary that can be used to express the "Contact" class. This allows all users of a resource of type "Contact" to have an agreed expectation of its properties and relationship to other resource types.
RDF schemas differ somewhat from XML schemas (such as DTDs or W3C XML Schemas) in that they do not define a permissible syntax but instead classes, properties, and their interrelation: they operate directly at the data model level, rather than the syntax level. Scaled up over the Web, RDF schemas are a key technology, as they allow machines to make inferences about the data collected from the web.
In fact, work is now underway to take RDF Schemas one step further in the description of ontologies. (An ontology is essentially a formal description of objects and their interrelationships.) The MIT/LCS has begun to define DAML (DARPA Agent Markup Language), a language for expressing ontologies. Although DAML is very much a work in progress, real work can be done now with RDF Schemas, see the section on Redfoot below.
The hardest problem in this area is not the infrastructure, but the actual ontologies themselves. Until an industry-wide ontology exists for, say , vehicle parts, there is a limit to the utility of the SW in the auto manufacturing industry. Organizations such as the Dublin Core Metadata Initiative have been developing such vocabularies for some time now, and they've made progress both in terms of the ontologies themselves and also tools to manage and maintain them.
Work on XML protocols -- the use of XML for messaging and remote procedure calls -- approaches the Semantic Web from the other end of the spectrum. Avoiding grand schemes for the classification of everything, it is focused on standardizing XML-based interactions between computers. A key component of XML protocol technology is the description and discovery of web services available via XML protocols such as SOAP, since systems require the ability to conduct electronic transactions with other systems of which they have no prior knowledge.
This requirement has led to the creation of technologies such as Web Services Description Language (WSDL), which describes the characteristics of the interface offered by a web service, and ADS, which allows the advertisement and discovery of such services. ADS, by offering techniques for embedding such descriptions inside normal web content, fits neatly into the Semantic Web vision. (For more on WSDL and ADS, see our XML Protocol Technology Reference.) The recently announced UDDI effort also provides an API for registries of web e-business services. Although the Semantic Web vision focuses on decentralized technology as opposed to centralized registries, the emphasis on machine discovery of resources is a common theme.
While the XML protocol-related technologies solve narrow problems in order to achieve results over the next year, they represent use-cases for the Semantic Web, and one expects that mature SW technologies will cater for the solution of problems such as these.
The major center of Semantic Web-related development thus far has been in the area of RDF. The creation of semantically-richer documents is a relatively easy task, so most of the effort has been concentrated on accumulating the data, storing it and querying it. RDF/XML provides a useful intermediate syntax which, when combined with tools like XSLT, allows multiple data sources to be combined.
Further details on RDF tools and applications can be obtained from the W3C's RDF home page, and the RDF category of XMLhack. In this section I will concentrate on introducing tools useful for making a relatively speedy start with Semantic Web development.
Redland: Redland is an RDF application framework with C and Perl APIs. As a framework, most of its components are pluggable. For instance, you can choose which RDF parser you use (an important factor at this stage in RDF's development, where the emphasis on conformance for RDF parsers is not as high as it is for XML parsers). Storage mechanisms are also pluggable: currently, in-memory storage and Berkeley DB are supported. Beta-level software.
Redfoot: Redfoot is a 100% Python application framework for distributed RDF applications. It provides a web interface to its RDF import, editing, and viewing functions. It also has support for RDF Schemas. One of its more intriguing features is emerging support for peer-to-peer exchange of RDF data -- peered Redfoots (Redfeet?) will be able to discover the contents of each other's stores. Easy customization of the web interface makes this a good choice for experimentation with RDF. Alpha/beta-level software.
Wraf: The Web Resource Application Framework is another RDF application framework, this time written in Perl. It also offers a web interface to RDF storage, editing and querying. Alpha-level software.
RSS 1.0: This work on the next generation of web site metadata distribution employs RDF for its data model and syntax. Of particular interest is its use at the W3C, where XSLT is used to extract the RSS information from the front page. Dan Connolly has documented how this was done. If you want to experiment with scraping data from XHTML pages, this is an interesting starting point.
Describing and retrieving photos using RDF and HTTP: This note, written by W3C staff, describes the creation of a system allowing the description and retrieval of photographs using RDF. The RDF itself is embedded in the comment portion of JPEG files using a custom editor application, and it's retrieved through an extension to a web server. This illustrates another good starting point for doing Semantic Web development using existing web technologies: attempting to combine this work with a framework such as Redfoot would be an interesting line of investigation.
The Semantic Web has already been the subject of much bluster among the XML developer community and will doubtless continue to be so. Arguments rage over the usefulness of the technology, the difficulty of using RDF, and so on. However, the Semantic Web vision of a machine-readable web has possibilities for application in most web technology -- while some complain about its lack of definition, its broad scope properly reflects the quietly radical effect it will have on the Web.