Our group works on various problems connected with the functioning and evolution of biological systems. We use mathematical tools, coming from statistics and combinatorics, algorithmic tools and molecular physics tools to study basic principles of cellular functioning starting from genomic data. We run several projects in parallel, all aiming at understanding the basic principles of evolution and co-evolution of molecular structures in the cell. They are intimately linked to each other. Applications are in medicine and environment.
Domain annotation and metagenomics - We are developing a new approach to domain annotation that successfully identifies remote homology. Domains are modeled through a multitude of probabilistic models, contrary to usual approaches based on consensus sequences. This method is now being extended to metagenomic annotation. read more
Transcriptomics and sequence analysis - We combine statistical modeling with combinatorial optimization to provide solutions that address each analysis step of a sequencing experiment with a particular interest on transcriptome sequencing (reconstruction of the transcriptional landscape, enumeration of alternative splicing events). read more
Protein evolution and interactions - Protein-protein interactions are at the heart of the molecular processes that constitute life. We are creating a large scale mapping of PPIs with information at the molecular level. We use sequence- and structure-based bioinformatics methods to predict the conformation of interacting proteins, their interaction sites and also which proteins interact and how strongly. read more
Protein conformational dynamics - We study protein conformational dynamics to predict the effects of disease-associated mutations and to characterize alternative functional conformations to be targeted by drugs. read more
Our methods have multiple applications which play a role in directed mutagenesis, synthetic biology, metagenomics and environment, gene annotation, mutations in genetic diseases.
“Infostery” analysis of short molecular dynamics simulations identifies highly sensitive residues and predicts deleterious mutations. Karami Y, Bitard-Feildel T, Laine E, Carbone A. (2018) Scientific Reports 8: 16126
Characterizing a protein mutational landscape is a very challenging problem in Biology. We present COMMA2, a method to automatically extract information protein from conformational ensembles that is relevant to the prediction and understanding of mutational outcomes. We perform simulations of the wild type and 175 mutants of PSD95’s third PDZ domain in complex with its cognate ligand. By recording residue displacements correlations and interactions, we identify “communication pathways” and quantify them to predict the severity of the mutations. Moreover, we show that by exploiting simulations of the wild type, one can detect 80% of the positions highly sensitive to mutations with a precision of 89%.
Jointly aligning a group of DNA reads improves accuracy of identifying large deletions. Shrestha AMS, Frith MC, Asai K, Richard H. (2018) Nucleic Acid Research 46: e18
Performing sequence alignment to identify structural variants, such as large deletions, from genome sequencing data is a fundamental task, but current methods are far from perfect. We present JRA, a method to jointly align reads to a genome, whereby alignment ambiguity of one read can be disambiguated by other reads. We show this leads to a significant improvement in the accuracy of identifying large deletions (≥20 bases), while imposing minimal computational overhead and maintaining an overall running time that is at par with current tools.
Local Interaction Signal Analysis Predicts Protein-Protein Binding Affinity. Raucci R, Laine E, Carbone A. (2018) Structure 26: P905-915
LISA is an empirical scoring function that estimates the binding affinity between two proteins, given the 3D structure of their complex. LISA provides a fine description of the spatial distribution of favorable and unfavorable contacts on the interacting surface. It achieves a correlation of 0.81 on 125 complexes whose binding affinities were experimentally measured with selected reliable technologies. LISA compares favorably with 17 other state-of-the-art functions.
A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling.Ugarte A, Vicedomini R, Bernardes J, Carbone A. (2018) Microbiome 6: 149
Biochemical and regulatory pathways have until recently been thought and modelled within one cell type, one organism and one species. This vision is being dramatically changed by the advent of whole microbiome sequencing studies, revealing the role of symbiotic microbial populations in fundamental biochemical functions. We present METACLADE, a novel profile-based domain annotation pipeline based on a multi-source domain annotation strategy. It applies directly to reads and improves identification of the catalog of functions in microbiomes. MetaCLADE highly improves current domain annotation methods and reaches a fine degree of accuracy in annotation of very different environments such as soil and marine ecosystems, ancient metagenomes and human tissues.
Meet-U: Educating through research immersion Abdollahi N, Albani A, Anthony E,... Laine E, Lopes A. (2018) PLOS Computational Biology 14: e1005992
We present Meet-U, a new educational initiative that aims to train students for collaborative work in computational biology and to bridge the gap between education and research. Meet-U mimics the setup of collaborative research projects and takes advantage of the most popular tools for collaborative work and of cloud computing. Students are grouped in teams of 4–5 people and have to realize a project from A to Z that answers a challenging question in biology. We report on our experience with Meet-U in two French universities with master’s students in bioinformatics and modeling, with protein–protein docking as the subject of the course. Meet-U is easy to implement and can be straightforwardly transferred to other fields and/or universities.
PureCLIP: Capturing Target-Specific protein-RNA Interaction Footprints from Single-Nucleotide CLIP-Seq Data. Krakau S, Richard H, Marsico A. (2017) Genome Biology 18: 240
CLIP and eCLIP techniques facilitate the detection of protein-RNA interaction sites at high resolution, based on diagnostic events at crosslink sites. However, previous methods do not explicitly model the specifics of iCLIP and eCLIP truncation patterns and possible biases. We developed PureCLIP, a hidden Markov model based approach, which simultaneously performs peak calling and individual crosslink site detection. It explicitly incorporates RNA abundances and, for the first time, non-specific sequence biases. On both simulated and real data, PureCLIP is more accurate in calling crosslink sites than other state-of-the-art methods and has a higher agreement across replicates.