Menu

Creating VoiceXML Applications With Perl

August 9, 2001

Kip Hampton

Introduction

VoiceXML is an XML-based language used to create Web content and services that can be accessed over the phone. Not just those nifty WAP-enabled "Web phones", mind you, but the plain old clunky home models that you might use to order a pizza or talk to your Aunt Mable. While HTML presumes a graphical user interface to access information, VoiceXML presumes an audio interface where speech and keypad tones take the place of the screen, keyboard, and mouse. This month we will look at a few samples that demonstrate how to create dynamic voice applications using VoiceXML, Perl, and CGI.

A rigorous introduction to VoiceXML and how it works is beyond the scope of this tutorial. For more complete introductions to VoiceXML's moving parts see Didier Martin's Hello, Voice World or the The VoiceXML Forum's FAQ.

Reach Out and Surf Somewhere

To demonstrate how easy it can be to make existing Web content available over the phone we will create a simple Perl CGI script that fetches an RSS channel file containing a list of the most recent uploads to CPAN and converts parts of it to VoiceXML so that it may be accessed over the phone via a VoiceXML gateway.


use strict;

use XML::XPath;

use LWP::UserAgent;

After loading the necessary module we begin our script by creating new HTTP::Request and LWP::UserAgent objects. We then call LWP::UserAgent's simple_request method to ask the remote server for the RSS file.


my $news_url = 'http://search.cpan.org/recent.rdf';

my $request = HTTP::Request->new('GET', $news_url);

my $ua = LWP::UserAgent->new();

my $response = $ua->simple_request($request);

Now that the request has been made, we will begin the VoiceXML output. We start by creating the mandatory vxml root element and a minimal form that contains a single block element. Inside the block element we put an audio element that asks the user to be patient while the RSS file is processed and a goto element that tells the VoiceXML browser to jump to the section of the current document labeled "headlines".


print qq*

<?xml version="1.0"?>

<vxml>

  <form id="greeting">

    <block>

      <audio>

        Please wait while I process the c pan news feed.

      </audio>

      <goto next="#headlines"/>

    </block>

  </form>

*;

Also in Perl and XML

OSCON 2002 Perl and XML Review

XSH, An XML Editing Shell

PDF Presentations Using AxPoint

Multi-Interface Web Services Made Easy

Perl and XML on the Command Line

Next we test the response object to ensure that we received the remote RSS file. If the file was successfully fetched, we create a new XML::XPath instance and pass it the content section of the response object for parsing. If anything goes awry during the request, or while parsing the returned content, we trap the error in the scalar $error for later processing. Although the eval block that wraps the initial call to XML::XPath adds a fair bit of overhead to the script, it nevertheless gives us a way to fail gracefully in the event of a parsing error. Without the surrounding eval, a parser error would cause the script to die unexpectedly.


my ($error, $xp);



if ($response->is_success) {

    eval {

        $xp = XML::XPath->new(xml => $response->content);

        $xp->find('/');

    };

    $error = 'Error parsing RSS file ' . $@ if $@;

}

else {

   $error = 'Remote server returned ' . $response->message();

}

If an error does occur along the way, we return a simple audio message that describes the error, disconnect (hang up on) the current user, and close the VoiceXML document.


if ( defined($error) ) {

    print qq*

   <form id="headlines">

     <block>

       <audio>

         I'm sorry. The following error occurred while fetching

         the headlines file. $error Please try again later.

       </audio>

       <disconnect/>

     </block>

  </form>

</vxml>

    *;

}

If the RSS file has been fetched and parsed successfully we create a new form element; then, using an audio element inside a block wrapper, we tell the caller about the success and prepare them to hear the list of modules.


else {

    print qq*

  <form id="headlines">

    <block>

      <audio>

        The RSS file has been fetched and processed successfully. The

        following modules have recently been up loaded to c pan.

      </audio>

    </block>

    <block>

    *;

Next we loop through all the item elements in the RSS document. For each item element encountered we print a corresponding audio element for our VoiceXML document using the value of each item's title child element as the text.


    foreach my $news_item ($xp->findnodes('//item')) {

        print "<audio>" .

          $news_item->findvalue('title') .

          "</audio>\n";

    }

Finally we signal the caller that the entire list has been read, invite them to call again the next day, disconnect, and close the VoiceXML document.


    print qq*

    <audio>

      This completes the latest c pan up loads. Please call again tomorrow.

    </audio>

    <disconnect/>

    </block>

  </form>

</vxml>

    *;

}

While this script is not terribly useful in and of itself, think for a moment about just exactly what we have done here. In a few lines of code we have taken a resource from a distant part of the Web, extracted the information that we care about, and made that information available from any phone anywhere in the world.

Creating Dynamic VoiceXML Applications

While the previous example hints at the potential of offering Web content and services over the phone, the resulting "conversation" between the caller and the VoiceXML application is too one-sided. Fortunately VoiceXML offers several elements that are specially designed for capturing user input. In addition, VoiceXML forms, like the HTML forms that came before, can be used to capture data to pass to the server via the standard HTTP GET and POST methods.

For our final example we will create a dynamic VoiceXML document based on POSTed data by creating a small application that implements a limited (and somewhat dubious) "mystic oracle" that appears to be sensitive to the caller's concerns. To keep things simple, our oracle will be implemented in two parts; a plain VoiceXML document containing a form to capture the caller's questions and a dynamic, CGI-created document that formulates the responses to those questions.

First let's create the form. We will begin with a simple greeting that will only be read the first time the user connects.


<?xml version="1.0"?>

<vxml>

  <form id="greeting">

    <block>

      <audio>

        Thank you for calling the mystic oracle!

      </audio>

      <goto next="#main_query"/>

    </block>

  </form>

Next we begin the main form. This form contains the sole "query_type" field that will be used to capture the caller's question. Pay special attention to the grammar element. It allows VoiceXML developers to define just exactly what input a given field will accept. In this case, for example, if the caller's question contains any of the words "career", "job", "boss", "coworker", or "department" the value of the "query_type" field will be set to "career".


  <form id="main_query">

      <field name="query_type">

      <grammar type="application/x-gsl" name="qtype">

      <![CDATA[

      [

         [romance love sex boyfriend girlfriend] {<query_type "romance">}

         [career job boss coworker department] {<query_type "career">}

         [family husband wife mother father son daughter] {<query_type "family">}

      ]

      ]]>

      </grammar>

The prompt element signals the VoiceXML gateway to read a bit of text to the caller and wait for a response. In this case, when the caller asks a question that contains one of the words defined in the earlier grammar element, the application thanks the caller and submits the data to the CGI portion of the application using the POST method. The submit element's namelist attribute allows us to specify which fields or variables from the current document we would like to have submitted.


      <prompt>

        Clear your mind and concentrate on your question. <break/>

        You may ask your question now.

      </prompt>

      <filled>

        <audio>thank you.</audio>

        <submit next="http://mysite.tld/cgi-bin/mystic_response.cgi"

                method="POST"

                namelist="query_type"/>

      </filled>

Next we perform a bit of error-trapping to give the oracle a more reasonable interface. The text contained by the nomatch tag will be read if the user asks a question that does not contain any of the words from our chosen grammar. The reprompt element tells the VoiceXML browser to loop back to and reread the previous prompt. If the caller fails to ask a question at all, the noinput elements will be read one at a time in the sequence defined by their individual count attributes. If by the third reprompt the caller has still not said anything, the application disconnects.


      <nomatch>

        The mystic oracle can only answer questions about romance, career,

        or family matters. Please try again.

        <reprompt/>

      </nomatch>

      <noinput count="1">

        I can sense your apprehension.

        <reprompt/>

      </noinput>

      <noinput count="2">

        You must say something.

        <reprompt/>

      </noinput>

      <noinput count="3">

        Please call back when you are less stressed.

        <disconnect/>

      </noinput>

All that remains for this document is to close the "query_type" field, its parent form, and the top-level vxml elements.


    </field>

  </form>

</vxml>

Now let's create the CGI script that responds to the caller's question. This script will be passed a single POSTed parameter named "query_type" that will have the value romance, career, or family.


use strict;

use CGI qw(:standard);



my $q = CGI->new();



my $query_type = $q->param('query_type');

First we will generate a semi-random, characteristically vague answer to the caller's question.


my @intro_phrases = ('My sources say',

                     'All signs indicate that',

                     'Search your heart. You know');



my @responses = ('the answer is yes.',

                 'the answer is no.',

                 'it is too soon to tell.',

                 'the outlook is hazy. Please ask again later.');



my $response_text = $intro_phrases[int( rand ( scalar (@intro_phrases) ) )] .

                    ' ' .

                    $responses[int( rand ( scalar (@responses) ) )];

Then we will create the VoiceXML output. Note how the inline Perl variables are used both to include the randomly-generated answer to the caller's question ($response_text) and to create an illusory sense of personalized context by repeating the general type of question that the caller asked ($query_type).


print qq*

<?xml version="1.0"?>

<vxml>

<form id="response">

  <block>

    <audio>

    I sense your consternation.

    Questions about $query_type can be very troublesome, indeed.

    $response_text

    </audio>

    <goto next="#new_or_exit"/>

  </block>

</form>

As a final feature, we will give the caller the opportunity to ask our mystic oracle another question. If they say "no" to the following prompt, the application will politely thank them and disconnect; otherwise, they will be redirected back to the main section of the previous document where they will be prompted for a new query.


<form id="new_or_exit">

  <field name="new_question">

    <prompt>

      Would you like to ask another question?

    </prompt>

    <grammar>

    <![CDATA[

    [

      [yes] {<new_question "yes">}

      [no] {<new_question "no">}

    ]

    ]]>

    </grammar>

    <filled>

    <if cond="new_question=='no'">

      <audio>Thank you for calling. Goodbye.</audio>

      <disconnect/>

    <else/>

    <goto next="http://mysite.tld/vxml/mystic_prompt.xml#main_query"/>

    </if>

    </filled>

  </field>

</form>

</vxml>

*;

A typical transcript of a call to the mystic oracle might look something like this:


Oracle: Thank you for calling the mystic oracle!

        Clear your mind and concentrate on your question.

        You may ask your question now.



Caller: Hmmmm. Should I take the new job I was just offered?



Oracle: Thank you.

        I sense your consternation.

        Questions about career can be very troublesome, indeed.

        All signs indicate that the answer is yes.

        Would you like to ask another question?



Caller: Yes, I would.



Oracle: Clear your mind and concentrate on your question.

...

Conclusions

Also in Perl and XML

OSCON 2002 Perl and XML Review

XSH, An XML Editing Shell

PDF Presentations Using AxPoint

Multi-Interface Web Services Made Easy

Perl and XML on the Command Line

VoiceXML is much more than an alternative interface to the Web. It allows developers to extend their existing applications in new and useful ways, and it offers many unique opportunities for new development. As you may have guessed, though, that power and flexibility come with a hefty price tag: VoiceXML gateways (the hardware and software that connect the Web to the phone system, translate text to speech, interpret the VoiceXML markup, etc.) are not cheap. The good news is that many of prominent VoiceXML gateway providers offer free test and deployment environments to curious developers, so you can check out VoiceXML for yourself without breaking the bank.

Resources