Adding SALT to HTML
Wireless applications are limited by their small device screens and cumbersome input methods. Consequently, many users are frustrated in their attempts to use these devices. Speech can help overcome these problems. It is the most natural way for humans to communicate. Speech technologies enable us to communicate with applications by using our voice. However, listening is slower than reading and callers have to remember all the information presented to them. Since our short-term memory is only capable of handling about 7 chunks of information, speech applications must be carefully designed.
Both wireless and speech applications have their benefits but also their limitations. Multimodal technologies attempt to leverage their respective strengths while mitigating their weaknesses. Using multimodal technologies, users can interact with applications in a variety of ways. They can provide input through speech, keyboard, keypad, touch-screen or mouse and receive output in the form of audio, video, text, or graphics.
The SALT forum is a group of vendors which is creating multimodal specifications. It was formed in 2001 by Cisco, Comverse, Intel, Microsoft, Philips and SpeechWorks. They created the first version of the Speech Application Language Tags (SALT) specification as a standard for developing multimodal applications. In July 2002, the SALT specification was contributed to the W3C's Multimodal Interaction Activity (MMI) . W3C MMI has published a number of related drafts, which are available for public review.
The main objective of SALT is to create a royalty-free, platform-independent standard for creating multimodal applications. A whitepaper published by SALT Forum further defines six design principles of SALT.
A number of vendors, including HeyAnita, Intervoice, MayWeHelp.com, Microsoft, Philips, SandCherry and Kirusa, SpeechWorks, and VoiceWeb Solutions, have announce products, tools, and platforms that support SALT. There is also an open source project, OpenSALT, in the works to develop a SALT 1.0 compliant browser. Detailed information can be found at the SALT Forum's implementation page.
Before diving into experimenting with HTML and SALT, we need to set up the appropriate development environment. I am going to use Microsoft's .NET Speech SDK 1.0. The SDK Beta 2 was released on October 30, 2002. It consists of the following components (a detailed description can be found in the Microsoft .NET Speech SDK and Platform Overview whitepaper):
The SDK can be downloaded or ordered by mail from the Microsoft Speech Technology site. You should make sure that you have meet the following requirements before beginning the installation.
Windows XP Home edition is not supported because IIS is not available. You will also need to have .NET Framework 1.0 and the SP2 installed one after the other, separately. They can be downloaded from Microsoft .NET Framework site. Make sure you do not install .NET Framework 1.1 Beta, as the .NET Speech SDK 1.0 will not work with this.
If you do not have Visual Studio .NET installed, or if you are not planning to use the developer tools, you will need to disable the Visual Studio .NET Speech Tools through the Custom Setup option.
|
| Figure 1. .NET Speech SDK Installation |
Once the installation is completed, you will find Microsoft .NET Speech SDK Beta 2 and Microsoft Internet Explorer Speech Add-in in your Programs menu.
The installation was not without problems. After the installation completed, I ran into an error with the Text-to-speech Engine (TTS). It returned error code of "-3" and gave the reason of "Internal SAPI/Prompt Engine error". After plowing through the documentation, I came across a resolution in the SDK's readme file. All I had to do was to change the default voice to one that comes from Microsoft. There are number of other "Known Issues" listed in the documentation which you should familiarize yourself with.
I am going to show how we can SALT-enable a simple HTML application by hand. The best place to start is by looking at some simple HTML code.
I created a directory called salt in the default document
root directory, c:\Inetpub\wwwroot\salt\ and placed the
following HTML file there:
1. <html> 2. <head> 3. <title>My First HTML Application</title> 4. </head> 5. <body> 6. <h3>This is my first HTML application!</h3> 7. </body> 8. </html>
Unsurprisingly, this yields the following page:
|
| Figure 2. Simple HTML page |
Now, let's add a SALT element to it. We want it to speak the sentence
back to us through text-to-speech (TTS). We will use
<prompt>, one of the top-level elements of SALT.
1. <html xmlns:salt="http://www.saltforum.org/2002/SALT">
2. <head>
3. <title>My First Multimodal Application</title>
4. </head>
5. <body onload="RunIt()">
6. <h3>This is my first Multimodal application!</h3>
7. <salt:prompt id="first">
8. This is my first Multimodal application!
9. </salt:prompt>
10. </body>
11. <script language="javascript">
12. function RunIt() {
13. first.Start();
14. }
15. </script>
16. </html>
In line 1, we added the SALT namespace. Lines 7-9 contain the
<prompt> element. It can be used for speech synthesis
or to playback a recorded audio file. The attribute
id="first" gives us a reference to the
<prompt> element which we use in the JavaScript.
SALT relies on a scripting language to tie together events and logic
between its elements and HTML elements. In our case the function
RunIt() is invoked when the page is loaded. All it does is to
execute the prompt and play the sentence "This is my first Multimodal
application!" through the text-to-speech engine. So far, so good. When I
tried to run the page, however, I did not hear anything. Instead I got
the following:
|
| Figure 3. Unexpected result from HTML + SALT page |
Clicking on IE's warning icon was no help. It turns out that I need to explicitly enable the speech add-on for IE, otherwise, it will ignore all the SALT elements. All I needed to do was to add two lines (lines 2 and 3):
1. <html xmlns:salt="http://www.saltforum.org/2002/SALT">
2. <object id="k-tags"
CLASSID="clsid:DCF68E5B-84A1-4047-98A4-0A72276D19CC"
VIEWASTEXT></object>
3. <?import namespace="salt"
implementation="#k-tags"/>
4. <head>
5. <title>My First Multimodal Application</title>
6. </head>
7. <body onload="RunIt()">
8. <h3>This is my first Multimodal application!</h3>
9. <salt:prompt id="first">
10. This is my first Multimodal application!
11. </salt:prompt>
12. </body>
13. <script language="javascript">
14. function RunIt() {
15. first.Start();
16. }
17. </script>
18. </html>
Now, running the application again, you should get the desired behavior. The text is displayed and spoken.
If you prefer to use recorded audio file instead of the mechanical TTS voice, you just need to replace lines 9-11 with:
<salt:prompt id="first"> <salt:content href="hello.wav"/> </salt:prompt>
The <content> element specifies the URL of the audio
file.
In this article I introduced multimodal XML technology and specifically SALT. Using Microsoft's .NET Speech SDK, you should now be able to add SALT elements to HTML web pages. Good luck with your further investigations with SALT.
Resources
|
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.