You are here

Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries.

TitleUlysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries.
Publication TypeJournal Article
Year of Publication2015
AuthorsGillet-Markowska, A, Richard, H, Fischer, G, Lafontaine, I
JournalBioinformatics
Volume31
Issue6
Pagination801-8
Date Published2015 Mar 15
ISSN1367-4811
Abstract

MOTIVATION: The detection of structural variations (SVs) in short-range Paired-End (PE) libraries remains challenging because SV breakpoints can involve large dispersed repeated sequences, or carry inherent complexity, hardly resolvable with classical PE sequencing data. In contrast, large insert-size sequencing libraries (Mate-Pair libraries) provide higher physical coverage of the genome and give access to repeat-containing regions. They can thus theoretically overcome previous limitations as they are becoming routinely accessible. Nevertheless, broad insert size distributions and high rates of chimerical sequences are usually associated to this type of libraries, which makes the accurate annotation of SV challenging.

RESULTS: Here, we present Ulysses, a tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well. Altogether, these results show that Ulysses is a valuable tool for the characterization of somatic mosaicism in human tissues and in cancer genomes.

DOI10.1093/bioinformatics/btu730
Alternate JournalBioinformatics
PubMed ID25380961