Blocks In Sequences

A tool for the analysis of the coevolution of amino-acid fragments in proteins, and for the identification of their networks.

 

Overview:

Small protein fragments, and not just residues, can be used as basic building blocks to reconstruct networks of coevolved amino acids in proteins. Fragments often enter in physical contact one with the other and play a major biological role in the protein. The nature of these interactions might be multiple and spans beyond binding specificity, allosteric regulation and folding constraints. Indeed, coevolving fragments are indicators of important information explaining folding intermediates, peptide assembly, key mutations with known roles in genetic diseases, distinguished subfamily-dependent motifs and differentiated evolutionary pressures on protein regions. BIS coevolution analysis detects networks of fragments interaction and highlights a high order organization of fragments demonstrating the importance of studying at a deeper level this structure. BIS can be applied to protein families that are highly conserved or represented by few sequences, enlarging in this manner, the class of proteins where coevolution analysis can be performed and making large-scale coevolution studies a feasible goal.

 

Download:

The BIS package is available here.


System requirements:

Linux or Mac OS X.

The program BIS uses several external tools that should be installed:

  • java6
  • perl
  • PhyML v2.4.4: it creates the phylogenetic tree associated to the set of aligned sequences given as input; it is run with the Blosum62 matrix (Guindon S, Gascuel O, A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood, Systematic Biology, 52(5):696-704, 2003)
  • retree: it transforms trees generated with PhyML in binary trees, if needed; it is found in the package PHYLIP 3.67, downloadable at http://cmgm.stanford.edu/phylip/ (Felsenstein J, PHYLIP: Phylogeny Inference Package, Cladistics, 5:164-166, 1989).

BIS also uses the clustering program CLAG (that clusterizes matrices of coevolution scores; downloadable at http://www.ihes.fr/carbone/data11) that we include in the package and needs not be installed. For an independent CLAG usage, see CLAG R package, especially designed to make CLAG use simple.

Installation

  • Unzip TOOLBIS.zip in a directory of your choice
  • Create a "Setup" directory in which you put PhyML and Phylip
  • PhyML must be in Setup/phyml_v2.4.4 and Phylip in Setup/phylip-3.67

 

Instructions for running BIS:

The instructions for running BIS and reading its output files are here. BIS can be run in different modes:

  • for fragment analysis (parameter "-h=y" in the command line)
  • for residue analysis (parameter "-h=n")
  • for analysis based on physico-chemical properties of amino-acids (option “-pc=y”)

Also, fragment analysis can be realized on positions with (at most) d exceptions (parameter "-d=1" for instance, when 1 exception is tolerated) or in all positions (option "-a=y").

 

Example of a BIS fragment analysis:

  • Create a directory named 452 in which you copy the file PF02216_initialAllignment.txt provided in TOOLBIS/DATA/PABD/452, to your new 452 directory under the name PF02216_full.txt
  • Run BIS like this:
    ./exe.pl -f=put-correct-path/452 -d=0 -h=y
    -e=put-correct-path/TOOLBIS -s=put-correct-path/Setup -o=linux
    Replace "put-correct-path" by the correct paths and "linux" by macOSX depending on your OS (it must be the exact suffix to the PhyML executable)
    This will take several hours.
  • You will find the results in the PF02216-CLUSTERFILE-XXX.txt files.
  • To generate plots (optional) you may now run:
    ./exe-RCommand.pl -f=put-correct-path/452 -d=0
    -e=put-correct-path/TOOLBIS
    This will take only a few seconds.

Things to be careful about:

  • In the alignment, sequence names must be upper-case.
  • The input file must be named PFxxxxx_full.txt where xxxxx are 5 digits (just use PF00000_full.txt if no number is relevant). If nothing happens when you run BIS it means you did not name the file correctly.
  • In the alignment input file, sequences will be reordered, so if you want to keep your original alignment unmodified you should make a backup of it.


Contacts:

For questions, comments or suggestions feel free to contact Alessandra Carbone or Linda Dib.

 

Citation:

Last Update May 2013