Tomorrow's Web Today

June 9, 2004

Daniel Zambonini

Ten to midnight -- it's your partner's birthday tomorrow, and you've forgotten their gift! Suddenly, a thought comes to mind: she's always wanted to see a musical in the big city. "Barry," you say, for that is the name of your web agent, "buy two tickets for Cats in London for tomorrow." A short pause, then Barry responds: "Please confirm; Cats the Andrew Lloyd Webber musical, in London the capital of the UK?" "Correct," you reply. "Understood." The next morning a print-out awaits your attention, comprising a receipt for your tickets and a map with directions to the theatre.

This may conjure up scenes from science fiction novels, but the enabling technologies behind this vision of tomorrow are available right now. Let's take a quick tour through the processes in this scenario and the technologies that drive them.

Multi Modal Interaction

We begin by talking to our web agent, a next-generation browser with the ability to act on our behalf. The agent is enabled for multi-modal interaction, providing a variety of means for input and output, including keyboard, screen, voice, and stylus.

Each form of interaction is governed by a separate set of XML technologies. In our example VoiceXML provides the core functionality, integrating closely with Automatic Speech Recognition (ASR) and Text To Speech (TTS) specifications. The following XML fragment shows an example of the Speech Recognition Grammar Specification, one such integrated technology. This provides the means to split our request into its constituent parts:

<grammar mode="voice" xml:lang="en-GB" version="1.0" root="buyCommand">

  <!-- Grammar to make a purchase, e.g. "buy some dog food" -->

  <rule id="buyCommand" scope="public">

    <ruleref uri="#action"/>

    <ruleref uri="#objSpecific"/>

    <ruleref uri="#object"/>


  <rule id="action">







  <rule id="objSpecific">

    <item repeat="0-">










  <rule id="object">


      <item>dog food</item>







VoiceXML provides the interface between speech and digital functionality, but it doesn't fully deliver the means for understanding the specifics of our request. From its knowledge of grammar, it will have calculated that a purchase is required. The purchase will be for "tickets" (quantity: 2), specifically for "Cats", which is situated in "London".

Semantic Web

Before progressing, our agent needs to resolve the ambiguities:

  • Tickets: Bus tickets, Cinema tickets, or Theatre tickets?
  • Cats: Domestic pet, Footwear, Sports team, or Theatre show?
  • London: Nightclub name, Hotel name, or Geographical location?

To refine these possibilities our agent queries a metadata repository. The repository is enabled through RDF (Resource Description Framework) and OWL (Web Ontology Language). By publishing metadata in RDF and OWL, web publishers can make precise, explicit statements about their content. This metadata can then be harvested into a central repository. The explicit nature of these statements is key, and allows for zero ambiguities and exact cross-referencing of objects and terms.

All statements concerning London (the capital of the UK) will reference the same identifier (a URI, normally a URL), allowing all facts known about London to be easily extracted and deduced. Example RDF fragments referencing London could be



  <geo:long>51 32 N</geo:long>

  <geo:lat>0 5 W</geo:lat>

  <dc:description>London is the capital city of the

      United Kingdom</dc:description>





  <dc:creator rdf:resource="" />


    rdf:resource="" />


    rdf:resource="" />


By making links between these exact terms and the statements involving them, our agent can begin to make intelligent decisions based on inference.

But what if our agent has decided that "The Lion King", the film, is also a valid choice -- and that it's currently showing in London, Ohio. The agent confirms the terms with the user (by emitting metadata descriptions of the most probable terms through a TTS interface), to ensure the correct unique identifiers are used for the remainder of the request.

Device Independent Content

Once our agent has identified the exact terms, it can again query the metadata repository, to find the URLs of content that match the request.

The content our agent accesseswill be semantically marked up -- defined in terms of what it means, rather than how it is presented. Enabling technologies for this semantic content include XHTML, SVG and XForms. The Modularization of XHTML will ensure that specific semantics such as ticket prices can be marked-up as required. As we will see later, it is no coincidence that all content is represented in an XML syntax.

By marking up content accurately, our agent can establish ticket prices and associated purchasing forms for each item of content.

Content Suitability

The agent has found relevant content, but it might not all be suitable. Suitability checks ensure that the content/sites don't implement processes or conditions that you disagree with.

Our agent queries the metadata repository again, this time checking for P3P or PICS information. The first query to the repository examined the subject-based metadata. The agent is now looking for privacy, legal and other metadata that may discount a site from meeting the conditions that we demand, such as not selling personal information to third parties. An example P3P fragment, possibly specifying privacy details for an online shop, could be:












    <DATA ref="#dynamic.miscdata">







Sources that are both relevant and suitable have now been identified.

Web Services

It turns out that our agent found three appropriate sources -- two local (British) and one American. Assuming the agent has been configured to always make the cheapest purchase, it now needs to convert the American ticket price (quoted in US Dollars) to British Pound Sterling. An accurate comparison of prices can then be made.

Our agent queries a UDDI (Universal Description, Discovery and Integration) registry to find a currency conversion web service. The UDDI registry contains a directory of online services that our agent can use.

Once a suitable currency conversion service is located, our agent asks for a WSDL (Web Services Description Language) description of the service. This description will inform our agent of how it must format the currency type(s) and amount when making its request. Finally, it can send the conversion request to the web service using SOAP. In return, the agent is provided with a real-time conversion of the ticket price. Example SOAP fragments for the request and response could be:


  <ns1:convert xmlns:ns1="urn:bankX-CurrencyConvert" ">

    <countryFrom xsi:type="xsd:string">USD</countryFrom>

    <countryTo   xsi:type="xsd:string">GBP</countryTo>

    <currencyAmount xsi:type="xsd:float">55.00</currencyAmount>



and in response:


  <ns1:convertResponse xmlns:ns1="urn:bankX-CurrencyConvert" >

    <return xsi:type="xsd:float">35.62</return>



The agent can now identify the cheapest source from which to make the purchase.

Device Independent Delivery

Unfortunately the chosen ticket vendor doesn't provide a web service for purchasing the tickets. However, the content on the site has been marked-up with XHTML, XForms and SVG. Our agent can therefore complete the form for purchasing the tickets, from its previous configuration of our personal details, and the semantic nature of the form.

Our agent has also been configured to print receipts and related information for all purchases. When the agent submits the purchase order form, it also provides technical details for the preferred format of the receipt page. These details include page dimensions (A4) and colour depth (black and white), to ensure the returned receipt is compatible with our printer.

This information is provided with CC/PP (Common Capabilities Preferences Profile) -- an XML technology for representing device capabilities and user preferences. The "CC" defines the capabilities of the device -- its technical abilities and limitations. Although our printer is capable of printing in 64,000 colours we can use the "PP" -- the preferences profile -- to specify a "user preference" e.g. that all colours should be reduced to black or white. An example fragment for the capabilities could be:

<rdf:Description ID="HardwarePlatform">










The CC/PP is intercepted by the server, and all returned content passed through an "adaptation phase". As the content is all XML based (XHTML, SVG, etc.) the adaptation can be handled by XSLT, XSL-FO and CSS. This transforms and styles the content inline with the CC/PP request. Even the SVG images can be scaled and simplified for particular user and device needs.

For dynamically generated content, the CC/PP profile could be analyzed during generation, negating the need for the XSL phase.

If a map has not been returned with the receipt, our agent could once again query the metadata repository (with the theatre name) for a postcode. The postcode, in turn, could be sent to a relevant web service to retrieve a map and directions, and these formatted as required.

Barry has completed his task.

Looking to the future

These technologies are clearly applicable to users with disabilities, allowing content and services to be accessed through a wide variety of interfaces. However, the general increase in accessibility that entails is applicable to all -- from being able to ask for directions whilst driving, to making general web access on mobile phones more usable.

Although these core technologies are available today, the situation described is still idealistic. Widespread availability of processors for these technologies, intelligent web agents, and general vendors adopting interoperable standards are still a hope for the future. There are also other related issues, such as security and trust, under careful consideration.

By implementing XML technologies now -- even just marking up your content in XHTML -- you can help make this exciting future a reality.