ProfileView

Functional characterization of protein sequence families using multiple single-domain probabilistic models.

Download Documentation

Overview

Sequence functional classification became a fundamental bottleneck to the understanding of the myriad of protein sequences accumulating in our databases due to the recent progress in genomics and metagenomics. The large diversity of homologous sequences hides, in many cases, a variety of functional activities that cannot be anticipated. Their identification appears critical for a fundamental understanding of living organisms and for biotechnological applications.

ProfileView is a novel computational method designed to functionally classify sets of homologous sequences. It is based on a learning architecture and strongly relies on the structure of biological data, answering to the challenge of automatically partitioning large datasets of protein sequences in pertinent subfamilies to learn meaningful conservation patterns. It constructs a library of probabilistic models accurately representing the functional variability of protein families, and it allows to extract biologically interpretable information from the learning process. It applies to protein families that are not necessarily large, nor conserved, whose homologues might be very divergent and for which functions should be discovered or characterized more precisely.

As a proof of concept, we apply ProfileView to the Cryptochrome/Photolyase family (CPF), a widespread class of proteins showing a large variety of functions. Decades of experimental studies on this family, functionally characterizing sequences and highlighting constitutive motifs, allow us to validate the functional organisation obtained with the ProfileView approach. In addition, the method allows to identify a distinct functional group, which appears unresolved with distance tree analysis and previously characterized proteins, likely corresponding to novel photoreceptors. Structural modelling confirmed the plausibility of this hypothesis. Thus, ProfileView appears as a powerful tool to classify protein sequences by function, screen sequences towards the design of accurate functional testing experiments and, possibly, discover new functions of natural sequences.

ProfileView flowchart