首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >Parallel de novo assembly of large genomes from high-throughput short reads
【24h】

Parallel de novo assembly of large genomes from high-throughput short reads

机译:来自高吞吐量短读取的大型基因组的平行De Novo组装

获取原文
获取外文期刊封面目录资料

摘要

The advent of high-throughput short read technology is revolutionizing life sciences by providing an inexpensive way to sequence genomes at high coverage. Exploiting this technology requires the development of a de novo short read assembler, which is an important open problem that is garnering significant research effort. Current methods are largely limited to microbial organisms, whose genomes are two to three orders of magnitude smaller than complex mammalian and plant genomes. In this paper, we present the design and development of a parallel de novo short read assembler that can scale to large genomes with high coverage. Our approach is based on the string graph formulation. Input reads are mapped to short paths, and the genome is reconstructed as a superpath anchored by distance constraints inferred from read pairs. Our method can handle a mixture of multiple read sizes and multiple paired read distances. We present parallel algorithms for string graph construction, string graph compaction, graph based error detection and removal, and computing aggregate summarization of paired read links across graph edges. Using this, we navigate the final graph structure to reproduce large contiguous sequences from the underlying genome. We present a validation of our framework on experimental and simulated data from multiple known genomes and present scaling results on IBM Blue Gene/L.
机译:高吞吐量短读技术的出现是通过在高覆盖范围内提供廉价的序列基因组来彻底改变生命科学。利用这项技术需要开发DE Novo短读汇编程序,这是一个重要的开放问题,是获得显着的研究工作。目前的方法主要限于微生物生物,其基因组比复合哺乳动物和植物基因组小于复合哺乳动物和植物基因组的2至三个数量级。在本文中,我们介绍了平行De Novo短读装瓶的设计和开发,可以扩展到具有高覆盖范围的大型基因组。我们的方法是基于弦图制定。输入读被映射到短路径,并且基因组被重建为由从读对的距离约束锚定的超级路径。我们的方法可以处理多个读取大小和多个配对读取距离的混合。我们呈现了串图结构的并行算法,串图压实,基于曲线图的错误检测和删除,以及横跨图形边缘的成对读链路的聚合概括。使用此,我们导航最终图结构以再现来自底层基因组的大型连续序列。我们在来自多个已知基因组的实验和模拟数据上展示了我们的框架,并在IBM Blue Gene / L上存在缩放结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号