OpenAssembler: assembly of reads from a mix of high-throughput sequencing technologies.
Sébastien Boisvert, François Laviolette, and Jacques Corbeil.
Robert Cedergren Bioinformatics Colloquium 2009 (Université de Montréal).
OpenAssembler: assembly of reads from a mix of high-throughput sequencing technologies
An accurate and complete genome sequence of a desired species or phylogenetically close relative is now
a basic pre-requisite for advanced genomics research. A crucial step in obtaining high-quality genome sequence is the ability to correctly assemble short individual sequence reads into longer contiguous sequences
accurately representing genomic regions that are much longer than any single contributing read. Current
sequencing technologies continue to offer increases in throughput and corresponding reductions in cost and
time. Unfortunately, the benefit of obtaining very large numbers of reads is complicated by a non-trivial
presence of sequence errors, with different types of errors and biases being observed with the different sequencing systems. Although software systems exist for assembling reads for each individual system, no
comprehensive procedure was proposed for high-quality genome assembly based on mixes of reads from
different technologies. We describe an open source software program called OpenAssembler
which has been specifically developed to assemble reads obtained from a combination of sequencing systems,
and compare its performance to other assembly packages on simulated and real datasets. To illustrate the
value of OpenAssembler, we used a combination of Roche/454 and Illumina reads to assemble the 3.6 Mb
Acinetobacter baylyi ADP1 genome (NCBI/Genbank accession CR543861) into 119 contigs containing 26
mismatches and 7 indels. The Newbler assembler, using only the Roche/454 reads (reads for which it has
been design for), assembled the genome into 118 contigs with 64 mismatches and 356 indels.