Adding SALT to HTML

May 14, 2003

Wireless applications are limited by their small device screens and cumbersome input methods. Consequently, many users are frustrated in their attempts to use these devices. Speech can help overcome these problems. It is the most natural way for humans to communicate. Speech technologies enable us to communicate with applications by using our voice. However, listening is slower than reading and callers have to remember all the information presented to them. Since our short-term memory is only capable of handling about 7 chunks of information, speech applications must be carefully designed.

Both wireless and speech applications have their benefits but also their limitations. Multimodal technologies attempt to leverage their respective strengths while mitigating their weaknesses. Using multimodal technologies, users can interact with applications in a variety of ways. They can provide input through speech, keyboard, keypad, touch-screen or mouse and receive output in the form of audio, video, text, or graphics.

The SALT Forum

The SALT forum is a group of vendors which is creating multimodal specifications. It was formed in 2001 by Cisco, Comverse, Intel, Microsoft, Philips and SpeechWorks. They created the first version of the Speech Application Language Tags (SALT) specification as a standard for developing multimodal applications. In July 2002, the SALT specification was contributed to the W3C's Multimodal Interaction Activity (MMI) . W3C MMI has published a number of related drafts, which are available for public review.

Objectives of SALT

The main objective of SALT is to create a royalty-free, platform-independent standard for creating multimodal applications. A whitepaper published by SALT Forum further defines six design principles of SALT.

Clean integration of speech with web pages
There is a lot of knowledge, skill, and investment in the existing web-based infrastructure. SALT relies on this investment by specifying a small set of XML elements to add speech capabilities to existing markup languages.
Separation of the speech interface from business logic and data
SALT does not alter the processing logic of the existing markup languages. It defines an independent set of elements that can be used cohesively with the existing technology.
Power and flexibility of programming model
DOM events and scripting are used to integrate SALT with existing pages. The scripting programming model provides the flexibility to add speech processing logic.
Reuse existing standards for grammar, speech output, and semantic results
Instead of reinventing the wheel of existing technologies, SALT reuses many of the existing standards.
Support a range of devices
One of the main objectives of SALT is the ability to extend many of the existing markup languages such as HTML, XHTML, cHTML, and WML. It is not restricted to any particular type of devices.
Minimal cost of authoring across modes and devices
The first five principles above result in minimizing the cost of developing, deploying and executing SALT applications.

A number of vendors, including HeyAnita, Intervoice, MayWeHelp.com, Microsoft, Philips, SandCherry and Kirusa, SpeechWorks, and VoiceWeb Solutions, have announce products, tools, and platforms that support SALT. There is also an open source project, OpenSALT, in the works to develop a SALT 1.0 compliant browser. Detailed information can be found at the SALT Forum's implementation page.

Microsoft .NET Speech SDK

Before diving into experimenting with HTML and SALT, we need to set up the appropriate development environment. I am going to use Microsoft's .NET Speech SDK 1.0. The SDK Beta 2 was released on October 30, 2002. It consists of the following components (a detailed description can be found in the Microsoft .NET Speech SDK and Platform Overview whitepaper):

Developer tools (for Visual Studio .NET) - Grammar Editor, Prompt Editor, ASP.NET Speech Control Editor and the Speech Debugging Console.
ASP .Net Speech Controls (for Visual Studio .NET)
Samples SALT applications
Documentation and tutorial on building SALT applications
Client add-on for Internet Explorer and Pocket IE, which can be used to run speech-enable web-pages.

The SDK can be downloaded or ordered by mail from the Microsoft Speech Technology site. You should make sure that you have meet the following requirements before beginning the installation.

Windows 2000 [Server] SP3, or Windows XP Pro SP1
Internet Information Server (IIS)
Internet Explorer 6.0 or later
.NET Framework 1.0 SP2 (Have to install .Net Framework first)
Visual Studio .Net (optional - if using the development tools)

Windows XP Home edition is not supported because IIS is not available. You will also need to have .NET Framework 1.0 and the SP2 installed one after the other, separately. They can be downloaded from Microsoft .NET Framework site. Make sure you do not install .NET Framework 1.1 Beta, as the .NET Speech SDK 1.0 will not work with this.

If you do not have Visual Studio .NET installed, or if you are not planning to use the developer tools, you will need to disable the Visual Studio .NET Speech Tools through the Custom Setup option.

Figure 1. .NET Speech SDK Installation

Once the installation is completed, you will find Microsoft .NET Speech SDK Beta 2 and Microsoft Internet Explorer Speech Add-in in your Programs menu.

The installation was not without problems. After the installation completed, I ran into an error with the Text-to-speech Engine (TTS). It returned error code of "-3" and gave the reason of "Internal SAPI/Prompt Engine error". After plowing through the documentation, I came across a resolution in the SDK's readme file. All I had to do was to change the default voice to one that comes from Microsoft. There are number of other "Known Issues" listed in the documentation which you should familiarize yourself with.

Adding Speech to HTML

I am going to show how we can SALT-enable a simple HTML application by hand. The best place to start is by looking at some simple HTML code.

I created a directory called salt in the default document root directory, c:\Inetpub\wwwroot\salt\ and placed the following HTML file there:

   1. <html>
   2. <head>
   3.   <title>My First HTML Application</title>
   4. </head>
   5. <body>
   6.   <h3>This is my first HTML application!</h3>
   7. </body>
   8. </html>

Unsurprisingly, this yields the following page:

Figure 2. Simple HTML page

Now, let's add a SALT element to it. We want it to speak the sentence back to us through text-to-speech (TTS). We will use <prompt>, one of the top-level elements of SALT.

    1. <html xmlns:salt="http://www.saltforum.org/2002/SALT">

    2. <head>

    3.   <title>My First Multimodal Application</title>

    4. </head>

    5. <body onload="RunIt()">

    6.   <h3>This is my first Multimodal application!</h3>

    7.   <salt:prompt id="first">

    8.     This is my first Multimodal application!

    9.   </salt:prompt>

   10. </body>

   11. <script language="javascript">

   12.   function RunIt() {

   13.     first.Start();

   14.   }

   15. </script>

   16. </html>

In line 1, we added the SALT namespace. Lines 7-9 contain the <prompt> element. It can be used for speech synthesis or to playback a recorded audio file. The attribute id="first" gives us a reference to the <prompt> element which we use in the JavaScript.

SALT relies on a scripting language to tie together events and logic between its elements and HTML elements. In our case the function RunIt() is invoked when the page is loaded. All it does is to execute the prompt and play the sentence "This is my first Multimodal application!" through the text-to-speech engine. So far, so good. When I tried to run the page, however, I did not hear anything. Instead I got the following:

Figure 3. Unexpected result from HTML + SALT page

Clicking on IE's warning icon was no help. It turns out that I need to explicitly enable the speech add-on for IE, otherwise, it will ignore all the SALT elements. All I needed to do was to add two lines (lines 2 and 3):

    1. <html xmlns:salt="http://www.saltforum.org/2002/SALT">

    2.   <object id="k-tags"

           CLASSID="clsid:DCF68E5B-84A1-4047-98A4-0A72276D19CC"

           VIEWASTEXT></object>

    3.   <?import namespace="salt"

           implementation="#k-tags"/>

    4. <head>

    5.   <title>My First Multimodal Application</title>

    6. </head>

    7. <body onload="RunIt()">

    8.   <h3>This is my first Multimodal application!</h3>

    9.   <salt:prompt id="first">

   10.     This is my first Multimodal application!

   11.   </salt:prompt>

   12. </body>

   13. <script language="javascript">

   14.   function RunIt() {

   15.     first.Start();

   16.   }

   17. </script>

   18. </html>

Now, running the application again, you should get the desired behavior. The text is displayed and spoken.

If you prefer to use recorded audio file instead of the mechanical TTS voice, you just need to replace lines 9-11 with:

<salt:prompt id="first">

  <salt:content href="hello.wav"/>

</salt:prompt>

The <content> element specifies the URL of the audio file.

Summary

In this article I introduced multimodal XML technology and specifically SALT. Using Microsoft's .NET Speech SDK, you should now be able to add SALT elements to HTML web pages. Good luck with your further investigations with SALT.

Resources

W3C Multimodal Interaction Activity site
SALT Forum
SALT Technical Whitepaper (PDF)
SALT 1.0 specification (PDF)