Menu

Web Services for Bioinformatics

May 14, 2002

Ethan Cerami

At the January 2002 O'Reilly Bioinformatics conference, Lincoln Stein delivered a keynote address on "Building a Bioinformatics Nation." In this talk, Lincoln argued that current biological databases are islands unto themselves, much like the Italian city states of the Middle Ages. He also proposed that a more formalized Web Service model could link disparate systems, and thereby create a more unified set of bioinformatics tools and databases. (For more on Lincoln's talk see his recent Nature article).

This article follows up on Lincoln's talk and explores two bioinformatic services you can try out today. By examining these specific services, we get a bird's eye view of the Web Service protocol stack, including WSDL and SOAP. Looking at working services also provides much food for thought. For example, the recently released Google API provides a glimpse of the future of business Web Services. In much the same vein, the two examples discussed here offer a glimpse of the future of bioinformatic services.

This article assumes you are familiar with the basic terminology of Web Services. If you need a quick introduction, check out my Web Services FAQ. For an introduction to Web Services for bioinformatics, take a look at Lincoln’s PowerPoint slides from the O'Reilly conference.

XEMBL

Our first example is the XEMBL service from the European Bioinformatics Institute. XEMBL provides complete access to the EMBL Nucleotide Sequence Database. This database is produced in collaboration with GenBank and the DNA Database of Japan, and currently provides access to over 16.8 million records, consisting of 19.6 billion nucleotides (see EMBL Database Stats.) It also provides access to completed genomes, including the human genome, the fruit fly, and C. elegans.

XEMBL is a recently released interface that provides easy XML access to the complete EMBL database. Access is provided via two main methods. The first is a REST-like interface whereby users specify parameters within a URL, and XEMBL returns a complete XML document. The second is a SOAP interface whereby users specify parameters within SOAP messages and XEMBL returns a complete XML document within a SOAP response.

In responding to the current debate between REST and SOAP, you can see that the XEMBL group has not taken sides, and simply chosen both. This is in line with one of Lincoln's main points -- databases should provide multiple modes of access to data, from HTML, XML, and SQL, all the way to SOAP.

For the REST-like or SOAP interfaces, XEMBL expects two main parameters: an ID and a format. The ID specifies a unique international accession code; for example, SC49845 specifies the AXL2 gene in baker's yeast. The format indicates the XML format of the returned document. Two format options are currently supported: BSML (Bioinformatics Sequence Markup Language) and AGAVE (Architecture for Genomic Annotation, Visualization and Exchange). Other formats, including GAME and BIOML, are planned for future releases.

Accessing the XEMBL REST Interface

To access the XEMBL REST interface, you simply need to specify the XEMBL URL and specify the ID and format as URL parameters. For example, this URL: http://www.ebi.ac.uk/cgi-bin/xembl/XEMBL.pl?id=SC49845&format=Bsml retrieves the SC49845 record in BSML format.

To create a Java client to XEMBL, you can easily use any number of XML parsers. Example 1 below illustrates the use of JDOM. The program expects two command-line arguments: an ID followed by an XML format.

Example 1: XEMBLClient, Version 1: REST Interface


package com.ecerami.bio;



import java.lang.StringBuffer;

import org.jdom.input.SAXBuilder;

import org.jdom.JDOMException;

import org.jdom.Document;

import org.jdom.output.XMLOutputter;



/**

* Sample XEMBL Client Program using JDOM

* For details regarding XEMBL, go to:  http://www.ebi.ac.uk/xembl/

**/

public class XEMBLClient1 {

	private String baseURL = "http://www.ebi.ac.uk/cgi-bin/xembl/XEMBL.pl?";



	public XEMBLClient1 (String id, String format) throws Exception {

		System.out.println ("Connecting to XEMBL...");

		System.out.println ("Retrieving ID:  "+id);

		System.out.println ("Format:  "+format);

		connect (id, format);

	}



	private void connect (String id, String format) throws Exception {

		//  Build document;  validation is turned off

		SAXBuilder builder = new SAXBuilder (false);



		//  Do not load External DTDs

		builder.setFeature(

			"http://apache.org/xml/features/nonvalidating/load-external-dtd", false);



		//  Create XEMBL URL;  append id and format

		StringBuffer url = new StringBuffer (baseURL);

		url.append ("id="+id);

		url.append ("&format="+format);



		System.out.println ("Using URL:  "+url.toString());

		Document doc = builder.build (url.toString());

		XMLOutputter outputter = new XMLOutputter();

		outputter.output(doc, System.out);

	}



	public static void main (String[] args) throws Exception {

		if (args.length != 2) {

			System.out.println ("Usage:  XEMBLClient1 [ID] [Format]");

			System.out.println ("Where Format is:  Bsml or sciobj (for AGAVE)");

			return;

		}

		XEMBLClient1 client = new XEMBLClient1(args[0], args[1]);

 	}

}

As you can see in Example 1, you access XEMBL by specifying the base URL and appending the id and format parameters. JDOM takes care of the rest by downloading the specified XML file, parsing its contents, and making the contents available to your application. In Example 1, the code simply outputs the contents of the XML file, but you can also use JDOM to extract any specific elements within the returned XML document.

Accessing the XEMBL SOAP Interface

To access the XEMBL SOAP interface, you first need to understand the XEMBL WSDL file. The WSDL file begins by specifying two <message> elements:


<message name="getNucSeqRequest" xmlns:tns="http://www.ebi.ac.uk/XEMBL">

	<part name="format" type="xsd:string">

		<documentation>Input parameter that indicates the result format that should be

		returned. Legit values: Bsml or sciobj. Defaults to Bsml if format not recognised.

		</documentation>

	</part>

	<part name="ids" type="xsd:string">

		<documentation>A space delimited list of international Nucleotide Sequence

		accession numbers (IDs). For example: "HSERPG U83300 AC000057".

		Minimum number of IDs is 1.</documentation>

	</part>

</message>

<message name="getNucSeqResponse">

	<part name="result" type="xsd:string">

		<documentation>An XML formatted result in either Bsml or AGAVE format.

		</documentation>

	</part>

</message>

The first message is the getNucSeqRequest message, which takes two parameters: a format and a space delimited set of IDs. The second message is the getNucSeqResponse message, which returns an XML string specifying the return results.

These two messages are then combined into a single operation:


<portType name="XEMBLPortType">

	<operation name="getNucSeq">

		<input message="tns:getNucSeqRequest" name="getNucSeq"/>

		<output message="tns:getNucSeqResponse" name="getNucSeqResponse" />

	</operation>

</portType>

The getNucSeqRequest operation therefore consists of a single request message followed by a single response message.

Once you understand the WSDL file, you can interface with XEMBL via your favorite SOAP toolkit such as Apache SOAP, SOAP::Lite, or Apache AXIS. Example 2 shows an interface using Apache AXIS.

Example 2: XEMBLClient, Version 2: SOAP Interface


package com.ecerami.bio;



import org.apache.axis.client.Call;

import org.apache.axis.client.Service;

import javax.xml.rpc.namespace.QName;



/**

* Sample XEMBL Client Program using Apache AXIS

* For details regarding XEMBL, go to:  http://www.ebi.ac.uk/xembl/

* For the XEMBL WSDL File, go to:  http://www.ebi.ac.uk/xembl/XEMBL.wsdl

* For details regarding AXIS, go to:  http://xml.apache.org/axis/

**/

public class XEMBLClient2 {

	String baseURL = "http://www.ebi.ac.uk:80/cgi-bin/xembl/XEMBL-SOAP.pl";



	public XEMBLClient2 (String id, String format) throws Exception {

		System.out.println ("Connecting to XEMBL...");

		System.out.println ("Retrieving ID:  "+id);

		System.out.println ("Format:  "+format);

		connect (id, format);

  	}



	private void connect (String id, String format) throws Exception {

		Service service = new Service();

		Call call = (Call) service.createCall();



		//  Set SOAP URL and Method Name

		call.setTargetEndpointAddress( new java.net.URL(baseURL) );

		call.setOperationName(new QName("http://www.ebi.ac.uk/XEMBL", "getNucSeq") );



		//  Set Method Parameters

		String params[] = new String[2];

		params[0] = new String (format);

		params[1] = new String (id);



		//  Invoke Remote Method and print XML Response

		String response = (String) call.invoke(params);

		System.out.println(response);

	}



	public static void main (String[] args) throws Exception {

		if (args.length != 2) {

      			System.out.println ("Usage:  XEMBLClient2 [ID] [Format]");

      			System.out.println ("Where Format is:  Bsml or sciobj (for AGAVE)");

      			return;

		}

		XEMBLClient2 client = new XEMBLClient2(args[0], args[1]);

  	}

}

Example 2 illustrates how to interface with the AXIS API directly. AXIS does, however, also include a source-code generator for converting WSDL files into Java source files (see the next example for details.)

To invoke a SOAP service, AXIS requires that you first instantiate a Call object. The Call object specifies the URL for the target service, and the method name to invoke. You can then specify a series of parameters that are passed to the remote method. In this case, we pass two strings: format and ID. The invoke() method creates a SOAP request, sends it to the server, and returns the corresponding SOAP response. Much like the REST example, you can then take the XML response, parse it with your favorite XML parser, and extract those elements that are of most interest.

BQS

Our second example is the Open Bibliographic Query System (BQS), also available from the European Bioinformatics Institute. The BQS system aims to provide unified access to life science publications. If you are already familiar with PubMed, you may be aware that PubMed provides an XML interface for retrieving individual publication summaries. The goal of BQS is to take this one step farther, and provide a richer interface for querying and retrieving publications. In the long term BQS will provide access to several publication repositories, but the current implementation provides access to MEDLINE data only.

Much like XEMBL, BQS provides several options for accessing data. The first option is via CORBA. The second option is via SOAP/WSDL. Unlike XEMBL, however, BQS supports over 30 distinct operations -- the WSDL file for BQS actually runs to 20 pages. It therefore provides a unique opportunity to explore the automatic source generator provided with the AXIS toolkit.

The main AXIS WSDL tool is WSDL2Java, which takes a WSDL file and automatically generates Java source code for interfacing with the specified service. To generate source files for OpenBQS, just enter the following command:


java org.apache.axis.wsdl.WSDL2Java

http://industry.ebi.ac.uk/openBQS/copies/BQSWebService.wsdl

AXIS will automatically download the WSDL file, and generate a total of four Java source code files. Each of these files will be automatically placed in a package corresponding to the base URL defined in the WSDL file. In this case, AXIS creates a new package called uk.ac.ebi.industry. The main file generated by AXIS is the EmblEbiBibServerBQSWebServiceSoap.java interface. This interface defines all the public methods defined within the WSDL file. A portion of this file is included below:

Example 3: EmblEbiBibServerBQSWebServiceSoap.java


package uk.ac.ebi.industry;



public interface EmblEbiBibServerBQSWebServiceSoap extends java.rmi.Remote {



...

    public int getBibRefCount() throws java.rmi.RemoteException;

    public byte[] getById(java.lang.String bibRefId) throws java.rmi.RemoteException;

...

}

The first method, getBibRefCount() returns the total number of citations within the repository. The second method, getById() performs a lookup on the specified PubMed ID, and returns an XML summary of the citation. A number of other methods are also defined, but we focus on these two to get started.

To access the interface, you can write a small Java client program. Example 4 illustrates a basic client program. The program expects a PubMed ID as a command line argument.

Example 4: BQSClient.java


package com.ecerami.bio;



import uk.ac.ebi.industry.*;		//  Import classes autogenerated by AXIS



/**

* Sample BQS Client Program using Apache AXIS

* For details regarding BQS, go to:

http://industry.ebi.ac.uk/openBQS/

* For the XEMBL WSDL File, go to:

http://industry.ebi.ac.uk/openBQS/copies/BQSWebService.wsdl

* For details regarding AXIS, go to:  http://xml.apache.org/axis/

**/

public class BQSClient {



	public BQSClient (String pub_med_id) throws Exception {

		System.out.println ("Connecting to Open BQS...");

		System.out.println ("Retrieving Pub Med ID:  "+pub_med_id);

		connect (pub_med_id);

	}



	private void connect (String pub_med_id) throws Exception {

		//  Get SOAP Service for BQS

		BQSWebService service = new BQSWebServiceLocator();

		EmblEbiBibServerBQSWebServiceSoap soapService =

			service.getEmblEbiBibServerBQSWebServiceSoap();



		//  Get Reference Count

		int refCount = soapService.getBibRefCount();

		System.out.println ("Reference Count:  "+refCount);



		//  Get Reference by Pub Med ID

		byte bytes[] = soapService.getById(pub_med_id);

		String reference = new String (bytes);

		System.out.println ("Reference:  "+reference);

	}



	public static void main(String [] args) throws Exception {

		if (args.length != 1) {

			System.out.println ("Usage:  BQSClient2 [PUB_MED_ID]");

			return;

		}

		BQSClient client = new BQSClient (args[0]);

	}

}

For an alternative to AXIS, try the GLUE platform from the Mind Electric. The GLUE platform provides a set of integrated tools and Java API for building Web Services. The platform is extremely elegant, includes an easy-to-read user guide, and provides excellent support for WSDL.

Looking Forward

A number of other bioinformatic services are currently available or in the works. For example, the OmniGene project from MIT aims to create an open source Web Services platform for bioinformatics. You can currently download the OmniGene browser to get a feel for the platform -- the OmniGene SOAP API should be available shortly. Additionally, the Distributed Annotation Service (DAS) provides a distributed platform for aggregating genome annotation data from multiple sources. DAS 1.52 is currently implemented as XML over HTTP, but DAS 2.0 may move to a SOAP interface (see RFC0 and RFC11 for details.) Lastly, the BioMOBY project aims to provide distributed access to multiple bioinformatic services, and provide a centralized registry for finding new services. All of these projects are likely to see much progress in the near future.

Acknowledgements

A number of people offered invaluable assistance in preparing this article. Jean-Jack Riethoven of the European Bioinformatics Institute provided help with the XEMBL interface, and provided permission to reprint the XEMBLWSDL file. Martin Senger, also of the European Bioinformatics Institute, answered my questions regarding the BQS system. Brian Gilman of the OmniGene project helped answer my questions regarding OmniGene. Finally, Mark Wilkonsin provided a summary of the BioMOBY project, and explained its future direction.