You are here

Filling annotation gaps in yeast genomes using genome-wide contact maps.

TitleFilling annotation gaps in yeast genomes using genome-wide contact maps.
Publication TypeJournal Article
Year of Publication2014
AuthorsMarie-Nelly, H, Marbouty, M, Cournac, A, Liti, G, Fischer, G, Zimmer, C, Koszul, R
Date Published2014 Aug 1
KeywordsCentromere, Consensus Sequence, DNA, Ribosomal, Genetic Loci, Genome, Fungal, Genomics, Molecular Sequence Annotation, Saccharomycetales, Synteny

MOTIVATIONS: De novo sequencing of genomes is followed by annotation analyses aiming at identifying functional genomic features such as genes, non-coding RNAs or regulatory sequences, taking advantage of diverse datasets. These steps sometimes fail at detecting non-coding functional sequences: for example, origins of replication, centromeres and rDNA positions have proven difficult to annotate with high confidence. Here, we demonstrate an unconventional application of Chromosome Conformation Capture (3C) technique, which typically aims at deciphering the average 3D organization of genomes, by showing how functional information about the sequence can be extracted solely from the chromosome contact map.

RESULTS: Specifically, we describe a combined experimental and bioinformatic procedure that determines the genomic positions of centromeres and ribosomal DNA clusters in yeasts, including species where classical computational approaches fail. For instance, we determined the centromere positions in Naumovozyma castellii, where these coordinates could not be obtained previously. Although computed centromere positions were characterized by conserved synteny with neighboring species, no consensus sequences could be found, suggesting that centromeric binding proteins or mechanisms have significantly diverged. We also used our approach to refine centromere positions in Kuraishia capsulata and to identify rDNA positions in Debaryomyces hansenii. Our study demonstrates how 3C data can be used to complete the functional annotation of eukaryotic genomes.

AVAILABILITY AND IMPLEMENTATION: The source code is provided in the Supplementary Material. This includes a zipped file with the Python code and a contact matrix of Saccharomyces cerevisiae.


SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Alternate JournalBioinformatics
PubMed ID24711652

Open Positions