|
|
Resource Guide -> XML and Perl Tutorials, Bioinformatics -> Parsing Protein Domains with Perl
Parsing Protein Domains with Perl
Date: Dec. 7, 2001 The Dictionary of Protein Sites and Patterns (PROSITE) database is founded on the discovery that proteins can be grouped by similarities in their sequences. The database is populated by definitions of these similarities described in the "mini-language" PAttern (PA), and are known as PROSITE patterns. This tutorial shows the user how to build Perl code that translates PROSITE patterns into Perl regular expressions, which makes the pattern information in a protein sequence available to Perl. It is then possible to parse patterns from the PROSITE flat file database to find and report pattern matches in a sequence. The article includes links to the PROSITE database, the PROSITE FTP site for local installation of the database, an ASCII flat file containing the Prosite data (prosite.dat, 4.1 Mb), PROSITE documentation, the O'Reilly Bioinformatics Technology Conference and Learning Perl, 3rd Edition. |
|
|
|
|
|
|