Cerulean: A Hybrid Assembly Using High Throughput Short and Long Reads

机译：Cerulean：使用高吞吐量短读和长读的混合程序集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats. Contribution: We present a hybrid assembly approach that is both computationally effective and produces high quality assemblies. Our algorithm first operates with a simplified version of the assembly graph consisting only of long contigs and gradually improves the assembly by adding smaller contigs in each iteration. In contrast to the state-of-the-art long reads error correction technique, which requires high computational resources and long running time on a supercomputer even for bacterial genome datasets, our software can produce comparable assembly using only a standard desktop in a short running time.

机译：在重复基因组中，使用高通量数据和短读段的基因组组装可以说仍然是不可解决的任务，因为当重复的长度超过读段长度时，很难明确地连接侧翼区域。具有长读取功能的第三代测序技术（太平洋生物科学公司）的出现使人们有机会解决短读取数据无法解决的复杂重复。然而，这些长读取具有较高的错误率，并且在不使用额外的高质量短读取的情况下组装基因组是一项艰巨的任务。最近，Koren等人。 2012年提出了一种方法，该方法使用高质量的短读数据来纠正这些长读，从而使长读组装成为可能。但是，由于两个数据集的大小（短读取和长读取）较大，因此即使在较小的细菌基因组上，对这些长读取的错误校正也需要过高的计算资源。在这项工作中，我们先组装短读段，然后再将这些长读段映射到汇编图上，以解决重复问题，而不是对长读段进行错误校正。贡献：我们提出了一种混合装配方法，该方法在计算上有效并且可以产生高质量的装配。我们的算法首先使用仅由长重叠群组成的汇编图的简化版本进行操作，并通过在每次迭代中添加较小的重叠群来逐步改善汇编。先进的长读取错误校正技术需要超级计算机上甚至细菌基因组数据集的大量计算资源和长运行时间，而我们的软件可以在短时间内仅使用标准台式机就可产生可比的装配时间。

著录项

来源
《International workshop on algorithms in bioinformatics》|2013年|349-363|共15页
会议地点
作者
Viraj Deshpande; Eric D.K. Fung; Son Pham; Vineet Bafna;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 15:13:59

相似文献

外文文献
中文文献
专利

1. Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms [J] . Berat Z Haznedaroglu, Darryl Reeves, Hamid Rismani-Yazdi, BMC Bioinformatics . 2012,第1期

机译：从高通量短读取测序数据的De Novo转录组组件的优化改善了非模型生物的功能注释
2. Development of high-throughput SNP-based genotyping in Acacia auriculiformis x A. mangium hybrids using short-read transcriptome data [J] . Melissa ML Wong, Charles H Cannon, Ratnam Wickneswari BMC Genomics . 2012,第1期

机译：利用短读转录组数据开发高通量金合欢x A. mangium杂种中基于高通量SNP的基因分型
3. Hybrid assembly with long and short reads improves discovery of gene family expansions [J] . Jason R. Miller, Peng Zhou, Joann Mudge, BMC Genomics . 2017,第1期

机译：具有长短读取的杂交装配改善了基因家族扩展的发现
4. Cerulean: A Hybrid Assembly Using High Throughput Short and Long Reads [C] . Viraj Deshpe, Eric D.K. Fung, Son Pham, WABI (Workshop) . 2013

机译：Cerulean：使用高吞吐量短而长读取的混合组件
5. Scaling short read de novo DNA sequence assembly to gigabase genomes. [D] . Cook, Jeffrey J. 2011

机译：将短读从头DNA序列组装扩展到gigabase基因组。
6. Development of high-throughput SNP-based genotyping in Acacia auriculiformis x A. mangium hybrids using short-read transcriptome data [O] . Melissa ML Wong, Charles H Cannon, Ratnam Wickneswari 2012

机译：利用短读转录组数据开发高通量金合欢x A. mangium杂种中基于高通量SNP的基因分型
7. Cerulean: A hybrid assembly using high throughput short and long reads [O] . Viraj Deshp, Eric Dk, Fung Son Pham, 2016

机译：Cerulean：使用高通量短读和长读的混合组件

Cerulean: A Hybrid Assembly Using High Throughput Short and Long Reads

摘要

著录项

相似文献

相关主题

期刊订阅