Our group works on various problems connected with the functioning and evolution of biological systems. We use mathematical tools, coming from statistics and combinatorics, algorithmic tools and molecular physics tools to study basic principles of cellular functioning starting from genomic data. We run several projects in parallel, all aiming at understanding the basic principles of evolution and co-evolution of molecular structures in the cell. They are intimately linked to each other. Applications are in medicine and environment.
Domain annotation and metagenomics - We are developing a new approach to domain annotation that successfully identifies remote homology. Domains are modeled through a multitude of probabilistic models, contrary to usual approaches based on consensus sequences. This method is now being extended to metagenomic annotation. read more
Transcriptomics and sequence analysis - We combine statistical modeling with combinatorial optimization to provide solutions that address each analysis step of a sequencing experiment with a particular interest on transcriptome sequencing (reconstruction of the transcriptional landscape, enumeration of alternative splicing events). read more
Protein evolution and interactions - Protein-protein interactions are at the heart of the molecular processes that constitute life. We are creating a large scale mapping of PPIs with information at the molecular level. We use sequence- and structure-based bioinformatics methods to predict the conformation of interacting proteins, their interaction sites and also which proteins interact and how strongly. read more
Protein conformational dynamics - We study protein conformational dynamics to predict the effects of disease-associated mutations and to characterize alternative functional conformations to be targeted by drugs. read more
Our methods have multiple applications which play a role in directed mutagenesis, synthetic biology, metagenomics and environment, gene annotation, mutations in genetic diseases.
PureCLIP: Capturing Target-Specific protein-RNA Interaction Footprints from Single-Nucleotide CLIP-Seq Data. Krakau S, Richard H, Marsico A. (2017) Genome Biology In press
CLIP and eCLIP techniques facilitate the detection of protein-RNA interaction sites at high resolution, based on diagnostic events at crosslink sites. However, previous methods do not explicitly model the specifics of iCLIP and eCLIP truncation patterns and possible biases. We developed PureCLIP, a hidden Markov model based approach, which simultaneously performs peak calling and individual crosslink site detection. It explicitly incorporates RNA abundances and, for the first time, non-specific sequence biases. On both simulated and real data, PureCLIP is more accurate in calling crosslink sites than other state-of-the-art methods and has a higher agreement across replicates. Link: https://github.com/skrakau/PureCLIP.
Plasmobase: a comparative database of predicted domain architectures for Plasmodium genomes. Bernardes JS, Vaquero C, Carbone A. (2017) Malar J. 16
Plasmobase is a unique database designed for the comparative study of Plasmodium genomes. Domain architecture reconstruction in Plasmobase relies on DAMA, the state-of-the-art method in architecture prediction, while domain annotation is realised with CLADE, a novel annotation tool based on a multi-source strategy. Plasmobase significantly increases the Pfam domain coverage of all Plasmodium genomes, it proposes new domain architectures as well as new domain families that have never been reported before for these genomes. It proposes a visualization of domain architectures and allows for an easy comparison among architectures within Plasmodium species and with other species, described in UniProt. Plasmobase is accessible at http://genome.lcqb.upmc.fr/plasmobase/.
BIS2Analyzer: a server for coevolution analysis of conserved protein families. Oteri F, Nadalin F, Champeimont R, Carbone A. (2017) Nucleic Acids Research. 45: W307–W314
BIS2Analyzer is a web server, openly accessible at http://www.lcqb.upmc.fr/BIS2Analyzer/, providing the online analysis of co-evolving amino-acid pairs in protein alignments, especially designed for vertebrate and viral protein families, which typically display a small number of highly similar sequences. It is based on BIS2, a re-implemented fast version of the co-evolution analysis tool Blocks in Sequences (BIS). BIS2Analyzer provides a rich and interactive graphical interface to ease biological interpretation of the results.
Improvement in protein domain identification is reached by breaking consensus, with the agreement of many profiles and domain co-occurrence. Bernardes JS, Zaverucha G, Vaquero C, Carbone A. (2016) PLoS Computational Biology. 12: e1005038
We address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. Our strategy is based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. http://www.lcqb.upmc.fr/CLADE.
The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report. MetaSUB International Consortium (Richard H.) (2016) Microbiome. 4: 24
The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium is a novel, interdisciplinary initiative comprised of experts across many fields, including genomics, data analysis, engineering, public health, and architecture. The ultimate goal of the MetaSUB Consortium is to improve city utilization and planning through the detection, measurement, and design of metagenomics within urban environments. We are developing new annotation methods that will be useful for the entire project. The data produced by the consortium can aid city planners, public health officials, and architectural designers and will lead to the discovery of new species, global maps of antimicrobial resistance (AMR) markers, and novel biosynthetic gene clusters (BGCs). Read more on Mapping the subway's microbiome.
JET2 Viewer: a database of predicted multiple, possibly overlapping, protein-protein interaction sites for PDB structures. Ripoche H, Laine E, Ceres N, Carbone A. (2016) Nucleic Acids Research. 45: D236-D242
We report predictions of protein-protein interfaces for the non-redundant set of all protein chains for which a stucture is available in the Protein Data Bank. The predictions were made using JET2 and were evaluated on more than 15 000 experimentally characterized protein interfaces. This is, to our knowledge, the largest evaluation of a protein binding site prediction method. The overall performance of JET2 on all interfaces are: Sen = 52.52, PPV = 51.24, Spe = 80.05, Acc = 75.89. The knowledge base contains more than 20 000 entries and is freely accessible at: http://www.jet2viewer.upmc.fr.