XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement
 Resource Guide -> Bioinformatics -> Parsing Protein Domains with Perl

Parsing Protein Domains with Perl

Date: Nov. 15, 2002
Link: http://www.perl.com/pub/a/2001/11/16/perlbio2.html
Source Author or Organization: James Tisdall, Perl.com

In this article James Tisdall, author of the O'Reilly & Associates book Beginning Perl for Bioinformatics, shows biologists with or without programming experiece how to use Perl to apply the power of bioinformatics to biological research.

The Dictionary of Protein Sites and Patterns (PROSITE) database is founded on the discovery that proteins can be grouped by similarities in their sequences. The database is populated by definitions of these similarities described in the "mini-language" PAttern (PA) and are known as PROSITE patterns. This tutorial shows the user how to build Perl code that translates PROSITE patterns into Perl regular expressions, which makes the pattern information in a protein sequence available to Perl. It is then possible to parse patterns from the PROSITE flat file database to find and report pattern matches in a sequence.

The article includes links to the PROSITE database, the PROSITE FTP site for local installation of the database, an ASCII flat file containing the Prosite data (prosite.dat, 4.1 Mb), PROSITE documentation, the O'Reilly Bioinformatics Technology Conference and Learning Perl, 3rd Edition.