Menu

WIDL: Application Integration with XML

October 2, 1997

Charles Allen

WIDL

Application Integration with XML

Charles Allen

Abstract

The problem of direct access to Web data from within business applications has until recently been largely ignored. The Web Interface Definition Language (WIDL) is an application of the Extensible Markup Language (XML) which allows the resources of the World Wide Web to be described as functional interfaces that can be accessed by remote systems over standard Web protocols. WIDL provides a practical and cost-effective means for diverse systems to be rapidly integrated across corporate intranets, extranets, and the Internet.

Overview

The explosive growth of the World Wide Web is providing millions of end-users access to ever-increasing volumes of information. The resources of legacy systems, relational databases, and multi-tier applications have all been made available to the Web browser, which has been transformed from an occasionally informative accessory into an essential business tool for organizations large and small.

While the Web has achieved the extraordinary feat of providing ubiquitous accessibility to end-users, it has in many cases reinforced manual inefficiencies in business processes as repetitive tasks are required to transcribe or copy and paste data from browser windows into desktop and corporate applications. This is as true of Web data provided by remote business units and external (i.e., partner or supplier) organizations as it is of Web data accessible from both public and subscription based Web sites.

Business units that have previously been unable to agree on middleware and data interchange standards are (by default) agreeing on HTTP and HTML as data communication and presentation standards. Because of the overwhelming focus on the browser, almost all Web applications require interaction with a human user. The problem of direct access to Web data from within business applications has been largely ignored, as has the possibility of using the Web as a platform for automated information exchange between organizations. The debut of XML is set to change all this, and in the process spark a major Web revolution: Web Automation (see Figure 1).


Figure 1 The need for Web Automation

XML enables the creation of Web documents that preserve data structure and include "machine-readable" hooks to enable intelligent processing by client applications. It is not necessary, however, for Web content to exist as XML in order for XML to be used today to automate the Web. The use of XML to deliver metadata about existing Web resources can provide sufficient information to empower non-browser applications to automate interactions with Web servers.

XML metadata defining interfaces to Web-enabled applications can provide the basis for a common API across legacy systems, databases, and middleware infrastructures, effectively transforming the Web from an access medium into an integration platform.

Web Automation

Imagine everything a browser can do: sign-on to a secure Web site; query that site for data; download the results; upload a response. Now imagine that your business applications can do the same thing, automatically, without human intervention and without using a browser. This is the power of Web Automation.

The benefits of Web Automation are numerous:

  • Competitive intelligence--aggregate product pricing data, news reports
  • Application integration--leverage investments in Web data and infrastructure
  • Implement robust ecommerce solutions without expense and difficulty of EDI or CORBA
  • Realize a 100% Web-based alternative to EDI
  • Put Web site functionality in the heart of customers' and suppliers' IT infrastructures

The incredible diversity of Web resources presents significant challenges for the automation of arbitrary tasks on the Web.

A robust infrastructure for Web Automation needs to provide:

  • Full interaction with HTML forms
  • An ability to handle both HTTP Authentication and Cookies
  • Both on-demand and scheduled extraction of targeted Web data
  • Aggregation of data from a number of Web sources
  • Chaining of services across multiple Web sites
  • An ability to integrate easily with traditional application development languages and environments
  • A framework for managing change in both the locations and structures of Web documents

webMethods has defined the Web Interface Definition Language (WIDL) as an application of XML to lay the foundation for Web Automation.

WIDL

The goal of the Web Interface Definition Language is to enable automation of all interactions with HTML/XML documents and forms, providing a general method of representing request/response interactions over standard Web protocols, and allowing the Web to be utilized as a universal integration platform.

Where XML supports the creation of Web content that preserves data structure, and promises Web documents that are "machine-readable," WIDL is an application of XML that defines interfaces and services within and across HTML, XML, and text documents. As shown in Figure 2, services defined by WIDL map existing Web content into program variables, allowing the resources of the Web to be made available, without modification, in formats well-suited to integration with diverse business systems.


Figure 2 WIDL allows Web resources such as package tracking services to ba accessed directly from business applications.

WIDL brings to the Web many of the features of IDL concepts that have been implemented in distributed computing and transaction processing platforms, including DCE and CORBA. A major part of the value of DCE and CORBA is that they can define services offered by applications in an abstract but highly usable fashion. WIDL describes and automates interactions with services hosted by Web servers on intranets, extranets and the Internet; it provides a standard integration platform and a universal API for all Web-enabled systems.

A service defined by WIDL is equivalent to a function call in standard programming languages. At the highest level, WIDL files are collections of services. WIDL defines the locations (URLs) of each service, input parameters to be submitted (via GET or POST methods) to each service, and output parameters to be returned by each service.

WIDL provides the following features:

  • A browser is not required to drive Web applications
  • Service definitions are dynamically interpreted and can thus be centrally managed
  • Client applications are insulated from changes in service locations and data extraction methods
  • Developers are insulated from network programming concerns
  • Application resources can be integrated across firewalls and proxies

WIDL can be used to describe interfaces and services for:

  • Static documents (HTML, XML, and plain text files)
  • HTML forms
  • URL directory structures

WIDL also has the ability to specify conditions for successful processing and error messages to be returned to calling programs. Conditions further enable services to be defined that span multiple documents.

Applications of WIDL

The success of the Web has exposed the advantages of distributed information systems to a global audience. Around the world, IT organizations, regardless of industry, are searching for ways to connect the Internet with new or existing applications, to use Web technology to reduce development, deployment, and maintenance costs.

Using HTML, XML, and HTTP as corporate standards glue, application integration requires only that target systems be Web-enabled. There are hundreds of products in the market today which Web-enable existing systems, from mainframes to client/server applications. The use of standard Web technologies empowers various IT departments to make independent technology selections. This has the effect of lowering both the technical and "political" barriers that have typically derailed cross-organizational integration projects.

The use of proprietary middleware infrastructures to integrate applications requires not only that the same software product be purchased by both organizations and successfully installed in both target hardware environments, but also that both target applications be tailored to support the middleware API. This type of investment can be disastrous if one company spends six months designing a CORBA-based business system only to discover that one of their business units or business partners is unable to install CORBA because it conflicts with their existing infrastructure. Conflicts can arise because of hardware or software incompatibilities, or simply because of difficulties in acquiring appropriate development resources.

A number of analysts have already warned that proprietary ecommerce platforms could lock suppliers into relationships by forcing them to integrate their systems with one infrastructure for business-to-business integration, making it costly for them to switch to or integrate with other partners who have selected alternate ecommerce platforms. Buyer-supplier integration issues involve many-to-many relationships, and demand a standard platform for functional integration and data exchange.

Here is a brief overview of the types of applications that WIDL enables:

Manufacturers and distributors

  • Access supplier and competitor ecommerce systems automatically to check pricing and availability
  • Load product data (spec sheets) from supplier Web sites
  • Place orders automatically (i.e., when inventory drops below predetermined levels)
  • Integrate package tracking functionality for enhanced customer service

Human resources

  • Automated update of new employee information into multiple internal systems
  • Automated aggregation of benefits information from healthcare and insurance providers

Governments

  • Kiosk systems that aggregate data and integrate services across departments or state and local offices

Shipping and delivery services

  • Multi-carrier package tracking and shipment ordering
  • Access to currency rates, Customs regulations, etc.

Shipping companies were early leaders in bringing widely applicable functionality to the Web. Web-based package tracking services provide important logistics information to organizations large and small.

Many organizations employ people for the sole purpose of manually tracking packages to ensure customer satisfaction and to collect refunds for packages that are delivered late. Integrating package tracking functionality directly into warehouse management and customer service systems is a huge benefit, boosting productivity and enabling more efficient use of resources.

Using WIDL, the web-based package tracking services of numerous shipping companies can be described as common application interfaces, to be integrated with various internal systems. In almost all cases, programmatic interfaces to different package tracking services are identical, which means that WIDL can impose consistency in the representation of functionality across systems.

Example 1 illustrates the use of WIDL to define a package tracking service for Federal Express. Note that the WIDL specifies a "Shipping" template. This indicates that there is a general class of shipping services, and that this particular WIDL is one implementation of the shipping interface.

Example 1 The WIDL Representation of a Package Tracking Service

  <WIDL NAME="FedexShipping" Template="Shipping" 
    BASEURL="http://www.fedex.com" VERSION="2.0"> 
    
    <SERVICE NAME="TrackPackage" METHOD="GET"  
      URL="/cgi-bin/track_it" 
      INPUT="TrackInput" OUTPUT="TrackOutput" /> 
    
    <BINDING NAME="TrackInput" TYPE="INPUT"> 
      <VARIABLE NAME="TrackingNum" TYPE="String" FORMNAME="trk_num" /> 
      <VARIABLE NAME="DestCountry" TYPE="String" FORMNAME="dest_cntry" /> 
      <VARIABLE NAME="ShipDate" TYPE="String" FORMNAME="ship_date" /> 
    </BINDING> 
    
    <BINDING NAME="TrackOutput" TYPE="OUTPUT"> 
      <CONDITION TYPE="FAILURE" REFERENCE="doc.title[0].text"  
        MATCH="FedEx Warning Form" 
        REASONREF="doc.p[0].text['&.*']" /> 
      <CONDITION TYPE="SUCCESS" REFERENCE="doc.title[0].text"  
        MATCH="FedEx Airbill:*"  
        REASONREF="doc.p[1].value" /> 
      <VARIABLE NAME="disposition" TYPE="String"  
        REFERENCE="doc.h[3].value" MASK="$*" /> 
      <VARIABLE NAME="deliveredOn" TYPE="String"  
        REFERENCE="doc.h[5].value" MASK="%%%$*" /> 
      <VARIABLE NAME="deliveredTo" TYPE="String"  
        REFERENCE="doc.h[7].value" MASK="*:" /> 
    </BINDING> 
    
  </WIDL> 

The FedexShipping interface in Example 1 contains one service (TrackPackage) which takes three input parameters (TrackingNum, DestCountry, ShipDate) and returns three output parameters (disposition, deliveredOn, deliveredTo). The WIDL definition describing the TrackPackage service is stored in an ASCII file, which is utilized by client programs at runtime to determine both the location of the service (URL) and the structure of documents that contain the desired data. Client programs access WIDL definitions from local files, naming services such as LDAP, HTTP servers, or other URL access schemes (see Figure 3).


Figure 3 WIDL files can be centrally managed with a well known URL or via a directory service such as LDAP. Unlike the way CORBA and DCE IDL are normally used, WIDL is interpreted at runtime. As a result, Service, Condition, and Variable definitions within WIDL files can be administered without requiring modification of client code. This usage model supports application-to-application linkages that are far more robust and maintainable than if they were coded by hand.

One of WIDL's most significant benefits is its ability to insulate client programs from changes in the format and location of Web documents. As long as the parameters of services do not change, Service URLs, object references in variables, regions, and conditions can all be modified without affecting applications that utilize WIDL to access Web resources.

There are three models for WIDL management:

  • Client side--where WIDL files are colocated with a client program
  • Naming service--where WIDL definitions are returned from directory services, i.e., LDAP
  • Server side--where WIDL files are referenced by, colocated with, or embedded within Web documents

WIDL does not require that existing Web resources be modified in any way. Flexible management models allow organizations to describe and integrate Web sites that are beyond their control, as well as to provide their business partners with interfaces to services that are controlled. The ability to seamlessly migrate from independent to shared management eases the transition from informal to formal business-to-business integration.

Elements of WIDL

The Web Interface Definition Language (WIDL) consists of six XML tags:

  • <WIDL> defines an interface, which can contain multiple services and bindings
  • <SERVICE/> defines a service, which consists of input and output bindings
  • <BINDING> defines a binding, which specifies input and output variables, as well as conditions for successful completion of a service
  • <VARIABLE/> defines input, output, and internal variables used by a service to submit HTTP requests, and to extract data from HTML/XML documents
  • <CONDITION/> defines success and failure conditions for the binding of output variables; specifies error messages to be returned upon service failure; enables alternate bindings attempts and the chaining of services
  • <REGION/> defines a region within an HTML/XML document; useful for extracting regular result sets which vary in size, such as the output of a search engine, or news stories

The complete WIDL DTD is included in Appendix A. In the next sections the attributes of each element of WIDL are presented and discussed by way of example.

<WIDL>

<WIDL> is the parent element for the Web Interface Definition Language; it defines an interface. Interfaces are groupings of related services and bindings. The following are attributes of the <WIDL> element:

NAME

Required. Establishes a name for an interface. The interface name is used in conjunction with a service name for naming or directory services.

VERSION

Optional. Specifies the version of WIDL. webMethods first implemented WIDL as HTML extensions. Experience with customers since late 1996 resulted in WIDL 2.0, an application of XML that is capable of automating complex interactions across multiple Web servers.

TEMPLATE

Optional. WIDL enables common interfaces to services provided by multiple sites. Templates allow the specification of interfaces, implementations of which may be available from multiple sources. A shipping template defines a functional interface for shipping services; various implementations can be provided for FederalExpress, UPS, and DHL.

BASEURL

Optional. BASEURL is similar to the <BASE HREF=""> statement in HTML. Some of the services within a given WIDL may be hosted from the same Base URL. If BASEURL is defined, the URL for various services can be defined relative to BASEURL. This feature is useful for replicated sites which can be addressed by changing only the BASEURL, instead of the URL for each service.

OBJMODEL

Optional. Specifies an object model to be used for extracting data elements from HTML and XML documents. Object models are the result of parsing HTML or XML documents. The use of object models is central to the functionality of WIDL. Object references are used in <VARIABLE/>, <CONDITION/> and <REGION/> elements. For this reason, the object model will be briefly discussed before proceeding with the description of the element definitions that constitute WIDL.

Object model

Many of the features of WIDL require a capability to reliably extract specific data elements from Web documents and map them to output parameters.

Two candidate technologies for data extraction are pattern matching and parsing. Pattern matching extracts data based on regular expressions, and is well suited to raw text files and poorly constructed HTML documents. There is a lot of bad HTML in the world! Parsing, on the other hand, recovers document structure and exposes relationships between document objects, enabling elements of a document to be accessed with an object model.

Using an object model, an absolute reference to an element of an HTML document might be specified:

 doc.p[0].text 

This reference would retrieve the text of the first paragraph of a given document.

From both a development and an administrative point of view, pattern matching is more labor intensive for establishing and maintaining relationships between data elements and program variables. Regular expressions are difficult to construct and prone to breakage as document structures change. For instance, the addition of formatting tags around data elements in HTML documents could easily derail the search for a pattern. An object model, on the other hand, can see through many such changes.

Patterns must also be carefully constructed to avoid unintentional matching. In complex cases, patterns must be nested within patterns. The process of mapping patterns to a number of output parameters can easily become unmanageable.

It is possible to achieve the best of both worlds by using pattern matching when necessary to match against the attributes of elements accessible via an Object Model. Using a hybrid model of pattern matching within parsed elements provides for the extraction of target information from preformatted text regions or text files.

This reference would retrieve the text of the first paragraph that contains 'Currency:' within a given document.

Various object models for working with HTML documents have been specified. The W3C has established a working group to define a standard Document Object Model (DOM). The WIDL specification allows for multiple object models. In implementing WIDL, we discovered many functional requirements not currently addressed by existing object models. These requirements will be demonstrated in various examples later in this article.

We now continue with a discussion of the attributes of the elements of the WIDL.

<SERVICE/>

The <SERVICE/> element describes a Web service, such as those provided by CGI scripts, or via NSAPI, ISAPI, or other back-end Web server programs. Services take a set of input parameters, perform some processing, then return a dynamically generated HTML, XML or text document.

The attributes of the <SERVICE/> element map an abstract service name into a service's actual URL, specify the HTTP method to be used to access the service, and designate "bindings" for input and output parameters.

NAME

Required. Establishes a name for a service. The service name is used in conjunction with an interface for naming or directory services.

URL

Required. Specifies the Uniform Resource Locator for the target document. A service URL can be either a fully qualified URL or a partial URL that is relative to the BASEURL provided as an attribute of the <WIDL> element.

METHOD

Required. Specifies the HTTP method (GET or POST) to be used to access the service.

INPUT

Required. Designates the <BINDING> to be used to define the input parameters for programs that call the service. The specified name must be that of a <BINDING> contained within the same <WIDL> as the service.

OUTPUT

Required. Designates the <BINDING> to be used to define the output parameters for programs that call the service. The specified name must be that of a <BINDING> contained within the same <WIDL> as the service.

AUTHUSER

Optional. Establishes the username for HTTP authentication.

AUTHPASS

Optional. Establishes the password for HTTP authentication.

TIMEOUT

Optional. Amount of time before service times out.

RETRIES

Optional. Number of times to retry the service before failing.

Typically the username/password combination is set independent of service definitions in WIDL. The AUTHUSER and AUTHPASS attributes allow a username and password to be defined outside of a calling program. This is useful in cases where multiple client programs use the same service.

<BINDING>

The <BINDING> element defines input and output variables for a service. Input bindings describe the data provided to a Web resource, and are analogous to the input fields in an HTML form. For a static HTML document no input variables are required. Output bindings describe which data elements are to be mapped from the output document returned as a result of accessing the Web resource with the given input variables. In most cases an output binding will map only a subset of the available elements in the output document.

NAME

Required. Identifies the binding for reference by service definitions and other binding definitions.

TYPE

Required. Specifies whether a binding defines input or output parameters.

<VARIABLE/>

The <VARIABLE/> element is used to describe both input and output binding parameters; different attributes are used depending on the type of parameter being described.

Common attributes are:

NAME

Required. Identifies the variable to calling programs.

VALUE

Optional. Designates a value to be assigned to the variable in HTTP transactions. For input variables this has the effect of rendering the variable invisible to calling programs; i.e., the specified value is submitted to the Web service without requiring an input from calling programs. For output variables this has the effect of hard-coding the value returned when the service is invoked.

USAGE

Optional. The default usage of variables is for specification of input and output parameters. Variables can also be used internally within WIDL, as well as to pass header information (i.e., USER-AGENT or REFERER) in an HTTP request. The USAGE attribute will be explored in Examples 2 and 3, which follow this <VARIABLE/> element overview.

TYPE

Required. Specifies both the data type and dimension of the variable.

The following attributes are specific to input variables:

FORMNAME

Optional. Specifies the variable name to be submitted via GET or POST methods. Obscure back-end variables can be given names that are more meaningful in the context of the service described by WIDL. Used in conjunction with WIDL Templates, FORMNAME permits the mapping of a single variable name across multiple service implementations. In the package tracking service in Example 1, the FORMNAME differs from the variable name. It is also possible to set FORMNAME="" to pass only the variable's value to the back-end program.

OPTIONS

Optional. Captures the options of list boxes, check boxes, and radio buttons. Useful for validating inputs prior to submitting input parameters to a service and for transforming input criteria into formats acceptable to back-end programs. For example, an options list could be used to translate a meaningful input of "full" to the "f" acceptable to a back-end program.

The following attributes are specific to output variables:

REFERENCE

Optional. Specifies an object reference to extract data from the HTML, XML, or text document returned as the result of a service invocation.

MASK

Optional. Masks permit the use of pattern matching and token collecting to easily strip away unwanted labels and other text surrounding target data items.

NULLOK

Optional. Overrides the implicit condition that all output variables return a non-null value.

Apart from the "default" behavior of variables defined in input bindings, there are two other usage models supported by WIDL: "internal" and "header." The USAGE attribute can define service inputs in place of or in addition to those required by a Web service's HTML form.

Internal variables enable variable substitution within input and output bindings. For instance, using internal variables, a portion of a service's URL or a pattern for matching within an object reference can be specified as a variable that is part of an input binding.

Header variables allow HTTP header information to be included as part of a service request. This is useful in many situations, including the passing of referrer information where required by back-end systems.

In Example 2, an auto loan service is defined for a site that uses a directory structure to organize loan information for various states. Rather than using CGI scripts to access a database of high, low, and average loan rates, unique URLs which contain a state abbreviation as part of target document names are linked from a pick list. The use of internal variables enables the parameterization of a portion of the URL. In this fashion, WIDL is able to define an input binding even though no HTML forms are present to query the user for information. The input binding specifies a variable "state" that is referenced in the URL attribute of the service definition as %state%. At runtime the value passed into the "state" variable is used to complete the service URL.

Example 2 Using Internal Variables to Parameterize Directory Structures

  <WIDL NAME=autoLoan VERSION=2.0>   
    
    <SERVICE NAME=AutoLoan METHOD=GET   
      URL="http://www.bankrate.com/autobytel/abt%state%a.htm"   
      INPUT="AutoLoanInput" OUTPUT="AutoLoanOutput" />   
      
      <BINDING NAME=AutoLoanInput TYPE=INPUT>   
        <Variable NAME=state TYPE=String FORMNAME="state" USAGE="INTERNAL" />   
      </BINDING>   
      
      <BINDING NAME="AutoLoanOutput" TYPE="OUTPUT">   
        <CONDITION TYPE="Failure" REASONTEXT="State not found" />   
        <VARIABLE NAME="state" TYPE="String"    
          REFERENCE="doc.table[4].tr[1].th[0].text" />   
        <VARIABLE NAME="avgNew" TYPE="String"   
          REFERENCE="doc.table[4].tr[2].td[1].text" />   
        <VARIABLE NAME="highNew" TYPE="String"   
          REFERENCE="doc.table[4].tr[2].td[2].text" />   
        <VARIABLE NAME="lowNew" TYPE="String"   
          REFERENCE="doc.table[4].tr[2].td[3].text" />   
        <VARIABLE NAME="avgUsed" TYPE="String"   
          REFERENCE="doc.table[4].tr[3].td[1].text" />   
        <VARIABLE NAME="highUsed" TYPE="String"   
          REFERENCE="doc.table[4].tr[3].td[2].text" />   
        <VARIABLE NAME="lowUsed" TYPE="String"   
          REFERENCE="doc.table[4].tr[3].td[3].text" />   
      </BINDING>   
      
  </WIDL>  

Because the AutoLoan service uses a variable to complete the URL to access a static document, an invalid input parameter results in an invalid URL. The <CONDITION/> statement in the output binding traps the document not found condition and returns a sensible error message to client programs.

Internal variables can also be used within object references that use pattern matching to index into the object tree.

Example 3 uses the currency exchange service provided by the Federal Reserve Bank to illustrate the use of internal variables to interactively query a single static document.

Example 3 Using Internal Variables to Input Criteria in Object References

  <WIDL NAME="FederalReserve" TEMPLATE="Currency"   
    BASEURL="http://www.ny.frb.org/" VERSION="2.0">   
    
    <SERVICE NAME="ExchangeRate" METHOD="GET"   
      URL="/pihome/mktrates/forex12.shtml"   
      INPUT="currencyInput" OUTPUT="currencyOutput" />   
    
    <BINDING NAME="currencyInput" TYPE="INPUT">   
      <VARIABLE NAME="Currency" TYPE="String"    
        FORMNAME="CURRENCY" USAGE="INTERNAL" />   
    </BINDING>   
    
    <BINDING NAME="currencyOutput" TYPE="OUTPUT">   
      <CONDITION TYPE="FAILURE" REASONTEXT="Currency not found" />   
      <VARIABLE NAME="rate" TYPE="String"   
        REFERENCE="doc.pre[0].line['*%Currency%*'].text[53-65]" />   
    </BINDING>   
    
  </WIDL> 

In this example currency rates for a number of countries are provided in a single document. The object reference for the 'rate' variable in the output binding uses an internal variable 'Currency' as part of the pattern that is matched to discover the current exchange rate.

The object reference used in this example also demonstrates two additional text manipulation features of the object model developed by webMethods. The .line[] construct allows access to individual lines of both preformatted text and text that has been formatted with the <br> line-break element. This greatly simplifies pattern matching expressions within object references.

The Federal Reserve Currency Exchange service returns rate information in a column from character position 53 to character position 65. This range of characters is specified by qualifying the .text[53-65] attribute of the line matching the input criteria.

<CONDITION/>

The <CONDITION/> element is used in output bindings to specify success and failure conditions for the extraction of data to be returned to calling programs. Conditions enable branching logic within service definitions; they are used to attempt alternate bindings when initial bindings fail and to initiate service chains, whereby the output variables from one service are passed into the input bindings of a second service. Conditions also define error messages returned to calling programs when services fail.

TYPE

Required. Specifies whether a condition is checking for the "Success" or the "Failure" of a binding attempt.
Any variable that returns a NULL value will cause the entire binding to fail, unless the NULLOK attribute of that variable has been set to true. Conditions can catch the success or failure of either a specific object reference or of an entire binding. In the case where a condition initiates a service chain, it is important that all variables bind properly.

REFERENCE

Optional. Specifies an object reference which extracts data from the HTML or XML document returned as the result of a service invocation. The REFERENCE attribute for conditions is equivalent to the REFERENCE attribute used in variable definitions.

MATCH

Required. Specifies a text pattern that will be compared with the object property referenced by the REFERENCE attribute.

REBIND

Optional. Specifies an alternate output binding. Typically a failure condition indicates that the document returned cannot be bound properly. REBIND redirects the binding attempt. This is useful in situations where the documents returned by a service are dependent upon the input criteria that was submitted. For example, a retail Web site may return a different document structure for an SKU depending on whether the item requested is a shirt, a tie, or trousers. The use of REBIND allows a conditions to determine the appropriate binding for extracting the desired data.

SERVICE

Optional. Specifies a service to invoke with the results of an output binding. Aside from the obvious benefit of chaining services to further automate the tasks that can be encapsulated for client programs, there are many cases when target documents can only be retrieved after visiting several Web pages in succession. In some instances cookies are issues by an entry page that must be visited prior to interacting with HTML forms, in others URLs are dynamically generated from databases for specific user identities.

REASONTEXT

Optional. The text to be returned as an error message when a service fails.

REASONREF

Optional. Reference to an element's attribute to be returned as an error message when a service fails.

WAIT

Optional. Amount of time to wait before re-trying retrieval of a document after a server has returned a 'service busy' error.

RETRIES

Optional. Number of times to retry the service before failing.

Example 4 illustrates the use of conditions to specify alternate bindings. Alternate bindings can be used when documents returned by services are dependent upon the inputs submitted to the service. In some rare cases, such as the StockMarketInfo service defined in this example, a service occasionally returns different document formats for no apparent reason. Conditions and rebinding handle any such situations.

Example 4 Conditions Initiate Alternate Attempts for Extracting Output Values

  <WIDL NAME="Yahoo" VERSION="2.0">   
    
    <SERVICE NAME="StockMarketInfo" METHOD ="GET"   
      URL="http://quote.yahoo.com/" OUTPUT ="marketOut">   
      
      <BINDING NAME="marketOut" TYPE="Output">   
        <CONDITION Type="Failure" REBIND="marketOut2" />   
        <VARIABLE TYPE="String[][]" NAME="info"    
          REFERENCE="doc.table[0].tr[0].td[].text" />   
        <VARIABLE TYPE="String[]" NAME="links"    
          REFERENCE="doc.table[0].tr[0].a[].href" />   
      </BINDING>   
      
      <BINDING NAME="marketOut2" TYPE="Output">   
        <VARIABLE TYPE="String[][]" NAME="info"    
          REFERENCE="doc.table[1].tr[0].td[].td[].text" />   
        <VARIABLE TYPE="String[]" NAME="links"    
          REFERENCE="doc.table[1].tr[0].a[].href" />   
      </BINDING>   
      
  </WIDL> 

Example 5 illustrates the use of conditions to specify a service chain. Service chains pass the name-value pairs of an output binding into the input binding of the service specified by a <CONDITION/> statement. Any name-value pairs matching the variables of the chained service's input binding will be used as input parameters. In this example, the productSearch service returns a URL when it successfully finds a product matching the search criteria. The success condition on the ProductSearchOutput binding causes the ExtractPrices service to be called. Because the output binding of productSearch matches the input binding of ExtractPrices, the variables are passed from one service into the other.

Example 5 Conditions Initiate Service Chains

  <WIDL NAME="EddieBaeur" VERSION=2.0>   
    
    <SERVICE NAME="ProductSearch" METHOD=GET   
      URL="http://www.ebauer.com/eb/ShopEB/prod_search_results.asp"   
      INPUT="productSearchInput" OUTPUT="productSearchOutput" />   
      
      <BINDING NAME="productSearchInput" TYPE="INPUT">   
        <VARIABLE NAME="searchstring" FORMNAME="searchstring"   
      </BINDING>   
      
      <BINDING NAME="productSearchOutput" TYPE="OUTPUT">   
        <CONDITION TYPE="Failure" REFERENCE="doc.p['*Sorry*'].text"    
          MATCH="*Sorry*" REASONREF="doc.p['*Sorry*'].text" />   
        <CONDITION TYPE="Success" SERVICE="ExtractPrices" />   
        <VARIABLE NAME="productURL" TYPE="String"    
          REFERENCE="doc.table[0].tr[1].td[3].a[0].href" />   
      </BINDING>   
      
      <SERVICE NAME="ExtractPrices" METHOD=GET URL="%productUrl%"   
        INPUT="ExtractPricesInput" OUTPUT="ExtractPricesOutput" />   
        
        <BINDING NAME="ExtractPricesInput" TYPE="INPUT">   
          <VARIABLE NAME="productUrl" TYPE="String" USAGE="INTERNAL" />   
        </BINDING>   
        
        <BINDING NAME="ExtractPricesOutput" TYPE="OUTPUT">   
          <VARIABLE NAME="Price" TYPE="String"   
            REFERENCE="doc.table[1].strong[0].value['*$$']" />   
        </BINDING>   
        
  </WIDL>   

It is important to note that the ExtractPrices service can be called independent of the productSearch service, and that the ExtractPrices service specifies productURL as an internal variable. The output variables from the productSearch service are not available to the ExtractPrices service except in the case where they have been passed via an input binding.

Service chains make it possible to interact with "shopping cart" services, where multiple service calls are required to add items, followed by a service call to submit an order.

<REGION/>

The <REGION/> element is used in output bindings to define targeted subregions of a document. This is useful in services that return variable arrays of information in structures that can be located between well known elements of a page.

Regions are critical for poorly designed documents where it is otherwise impossible to differentiate between desired data elements (for instance, story links on a news page) and elements that also match the search criteria.

NAME

Required. Specifies the name for a region. This name can then be used as the root of an object reference. For instance, a region named foo can be used in object references such as:
 foo.p[0].text 

START

Required. An object reference that determines the beginning of a region.

END

Required. An object reference that determines the end of a region.

Example 6 demonstrates the use of regions in a news service, where the number of news stories varies day to day. Regions permit the extraction of data elements relative to other features of a document. The tops region begins with a text object that matches the pattern 'Last Updated' and ends with an object that matches 'For more*'.

Example 6 Regions Permit the Extraction of Data Elements

  <WIDL NAME="News" VERSION="2.0">   
    
    <SERVICE NAME="Techweb" METHOD="GET"   
      URL="http://www.techweb.com/" OUTPUT="techwebOut">   
      
      <BINDING NAME="techwebOut" TYPE="OUTPUT">   
        <REGION NAME="tops" START="doc.font['Last?Updated*']"    
          END="doc.b['For?more*']" />   
        <VARIABLE NAME="service" TYPE="String" VALUE="TECHWEB Top Stories" />   
        <VARIABLE NAME="url" TYPE="String" REFERENCE="doc.url" />   
        <VARIABLE NAME=stories TYPE="String[]" REFERENCE="tops.a[].text" />   
          <VARIABLE NAME="links" TYPE="String[]" REFERENCE="tops.a[].href" />   
      </BINDING>   
      
  </WIDL>   

Variable references into the tops region collect arrays of anchors and anchor text, regardless of the fact that the sizes of the arrays change throughout the day. The object references within tops are vastly simplified by the processing already provided by the region definition:

 tops.a[].text tops.a[].href 

It is also worth noting that the news service in Example 6 has no input binding. Input bindings are not required for service definitions.

Object References

The default object model used by WIDL provides object references for accessing elements and properties of HTML and XML documents. This model is based on the JavaScript page object model, but without the JavaScript method definitions.

Using the default object model, all elements of HTML and XML documents can be addressed in the following ways:

  • By name, if the target element has a non-empty name attribute. For example, the value of an HTML element <a name="foo"> can be referenced:
 doc.foo.value 

By absolute indexing, where each array of elements has a zero-based integer index, i.e.:

 doc.headings[0].text doc.p[1].text 

By relative indexing, which directs the binding algorithm to search the VALUE attributes of each element in the array, until a match is found. The match must be complete, which requires the use of wildcard metacharacters for partial string matches. Note that the search will return the first matching element, if any:

 doc.tr['*pattern*'].td[1].text 

By region indexing, which directs the binding algorithm to search only within a region of a document:

 myregion.a[2].href 

By attribute matching, which directs the binding algorithm to search an object's attributes until a match is found. Attribute matching is done with parenthesis instead of square brackets:

 doc.a(name='foo').href 

The following properties are available for all objects:

.text/.txt

Returns the text of a container

.value/.val

Returns the value of a container

.source/.src

Returns the source of a container

.index/.idx

Returns the index of a container

.reference/.ref

Returns the fully qualified object reference

Attributes of HTML containers take precedence over properties, which have alternate accessors.

.text/.txt and .value/.val are equivalent except when a document element has an identically named attribute.

Putting WIDL to Work

WIDL files can be hand-coded or developed interactively with command line or graphical tools, which provide aids for determining object references used in <VARIABLE/>, <CONDITION/>, and <REGION/> declarations.

Once a WIDL file has been created, its use depends upon the implementation of products that can process and understand WIDL services. A Web integration platform based on WIDL needs to provide:

  • A mechanism for retrieving WIDL files, either from a local file system, a directory service such as LDAP, or a URL
  • An HTML and XML parser, and text pattern matching capabilities, providing an object model for accessing elements of Web documents
  • HTTP and HTTPS support, to initiate requests and receive Web documents

Apart from these requirements, a WIDL processor could be delivered as a Java class or a Windows DLL, for integration directly with client applications, or as a standalone server with middleware interfaces, allowing thin-client access to Web automation functionality.

Generating Code

The primary purpose of WIDL is integration with corporate business applications. In much the same way that DCE or CORBA IDL is used to generate code fragments, or "stubs," to be included in development projects, WIDL provides the necessary ingredients for generating Java, JavaScript, C/C++, and even Visual Basic client code.

webMethods has developed a suite of Web Automation products for the development and management of WIDL files, as well as the generation of client code from WIDL files. Client stubs, which we affectionately call "Weblets," present developers with local function calls, and encapsulate all the methods required to invoke a service that has been defined by a WIDL file.

Example 7 Java Stub

  import watt.api.*;   
  
  public class TrackPackage extends Object   
  {   
  public String TrackingNum;   
  public String disposition;   
  public String deliveredOn;   
  
  public String deliveredTo;   
  
  public TrackPackage(String TrackingNum)   
  
  throws IOException, WattException, WattServiceException  
  
  {   
  String args[][] = {   
  {"TrackingNum", TrackingNum},   
  {"DestCountry", DestCountry},   
  {"ShipDate", ShipDate}   
  };   
  
  Context c = new <I>Context</I>();   
  
  c.loadDocument("Shipping.widl");   
  Result r = c.invokeService("FedexShipping",    
  "TrackPackage", args);   
  
  disposition = r.<I>getVariable</I>("disposition");   
  deliveredOn = r.<I>getVariable</I>("deliveredOn");   
  deliveredTo = r.<I>getVariable</I>("deliveredTo");   
  }   
  }   

Example 7 features a Java class generated from the package tracking WIDL presented earlier in Example 1. This class demonstrates the following methods that are part of the API that webMethods has developed for processing WIDL:

  • Context
  • loadDocument
  • invokeService
  • getVariable

After declaring the variables that will be used by the PackageTracking class, a handle c to a new Context of the webMethods Web automation runtime is created. All API calls are then made against this handle.

loadDocument loads and parses the specified WIDL file, in this case Shipping.widl. Loading the WIDL defines the services of the Shipping interface to the runtime. invokeService actually submits the input parameters to the TrackPackage service, which makes the appropriate HTTP request and returns either a result set which contains the bound output variables or an error message specified by a <CONDITION/> statement within the <SERVICE/> definition. getVariable is then used to extract the values of the output variables and to assign them to class variables.

Within the Java application, the package tracking service looks like a simple instantiation of the TrackPackage class:

 TrackPackage p = new TrackPackage("12345678"); 

In short, an application makes a call to a local function that has been generated by WIDL. The local function encapsulates the API calls to the WIDL processor. The WIDL processor:

  • Loads the WIDL file from a local or remote file system
  • Passes the function's input parameters as an HTTP request
  • Parses the retrieved document to extract target data items
  • Executes any conditional logic for error checking or service chaining
  • Returns the extrated data into the output parameters of the calling function

Generated Java classes can be incorporated in standalone Java applications, Java Applets, JavaScript routines, or server-side Java "Servlets." Generated C/C++ encapsulating Web services can be deployed as DLLs, shared libraries, or standalone executables. webMethods implementation, the Web Automation Platform, provides Java classes, a shared library, a Windows DLL and an Active/X control to support Visual Basic modules which can be embedded in spreadsheets and other Microsoft Office applications.

Conclusion

Web technology is strong on interactivity but low on automation. The primary applications of the Web, including Push and Agent technologies, are almost exclusively focused on end users. Data that is being made available in HTML format is effectively inaccessible to business applications other than the Web browser.

On corporate intranets and extranets, the Web browser has enabled access to business systems, but has in many cases reinforced manual inefficiencies as data must be transcribed from browser windows into other application interfaces.

Electronic commerce on the Web is typically driven manually via a browser. In order to achieve business-to-business integration, organizations have resorted to proprietary protocols. The many-to-many nature of Web commerce demands a standard for automated integration.

Interactions normally performed manually in a browser, such as entering information into an HTML form, submitting the form, and retrieving HTML documents, can be automated by capturing details such as input parameters, service URLs, and data extraction methods for output parameters. Mechanisms for condition processing can also be provided to enable robust error handling.

The Web Interface Definition Language (WIDL) is an application of the Extensible Markup Language (XML), which allows the resources of the World Wide Web to be described as functional interfaces that can be accessed by remote systems over standard Web protocols. WIDL transforms the Web into a standards-based integration platform, providing a practical and cost-effective infrastructure for business-to-business electronic commerce over Web.

Appendix A

Example 8 The WIDL DTD

    <!ELEMENT WIDL ( SERVICE | BINDING )* > 
    <!ATTLIST WIDL
    NAME       CDATA #IMPLIED
    VERSION (1.0 | 2.0 | ...) "2.0"
    TEMPLATE   CDATA #IMPLIED
    BASEURL    CDATA #IMPLIED
    OBJMODEL (wmobj | ...) "wmobj"
    
    <!ELEMENT SERVICE EMPTY>
    <!ATTLIST SERVICE
    NAME       CDATA #REQUIRED
    URL        CDATA #REQUIRED
    METHOD (Get | Post) "Get"
    INPUT      CDATA #IMPLIED
    OUTPUT     CDATA #IMPLIED
    AUTHUSER   CDATA #IMPLIED
    AUTHPASS   CDATA #IMPLIED
    TIMEOUT    CDATA #IMPLIED
    RETRIES    CDATA #IMPLIED
    
    <!ELEMENT BINDING ( VARIABLE | CONDITION | REGION )* > 
    <!ATTLIST BINDING 
    NAME       CDATA #REQUIRED
    TYPE (Input | Output) "Output">
    
    <!ELEMENT VARIABLE EMPTY>
    <!ATTLIST VARIABLE
    NAME       CDATA #REQUIRED
    FORMNAME   CDATA #IMPLIED
    TYPE (String | String[] | String[][]) "String" 
    USAGE (Default | Header | Internal) "Function" 
    REFERENCE  CDATA #IMPLIED
    VALUE      CDATA #IMPLIED
    MASK       CDATA #IMPLIED
    NULLOK          #BOOLEAN>
    
    <!ELEMENT CONDITION EMPTY>
    <!ATTLIST CONDITION
    TYPE (Success | Failure | Retry) "Success" 
    REF        CDATA #REQUIRED
    MATCH      CDATA #REQUIRED
    REBIND     CDATA #IMPLIED
    SERVICE    CDATA #IMPLIED
    REASONREF  CDATA #IMPLIED
    REASONTEXT CDATA #IMPLIED
    WAIT       CDATA #IMPLIED
    RETRIES    CDATA #IMPLIED>
    
    <!ELEMENT REGION EMPTY>
    <!ATTLIST REGION
    NAME       CDATA #REQUIRED
    START      CDATA #REQUIRED
    END        CDATA #REQUIRED>

About the Author

Charles Allen
3975 University Drive
Suite 360
Fairfax, VA 22030
(703) 352-8345
charles@webMethods.com

Charles Allen is VP of Product Management for webMethods, Inc., the leading provider of Web Automation and integration solutions for the Global 2000. Prior to joining webMethods, Mr. Allen was a founding member of Open Environment Corporation. Most recently he was responsible for technology acquisitions and joint ventures in the Asia/Pacific region. An inveterate communicator, Mr. Allen has presented extensively on the Web and distributed systems technology at events around the world.