首页> 外文会议>2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) >Parallel de novo assembly of large genomes from high-throughput short reads
【24h】

Parallel de novo assembly of large genomes from high-throughput short reads

机译:从高通量短读物中并行进行大型基因组的从头组装

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The advent of high-throughput short read technology is revolutionizing life sciences by providing an inexpensive way to sequence genomes at high coverage. Exploiting this technology requires the development of a de novo short read assembler, which is an important open problem that is garnering significant research effort. Current methods are largely limited to microbial organisms, whose genomes are two to three orders of magnitude smaller than complex mammalian and plant genomes. In this paper, we present the design and development of a parallel de novo short read assembler that can scale to large genomes with high coverage. Our approach is based on the string graph formulation. Input reads are mapped to short paths, and the genome is reconstructed as a superpath anchored by distance constraints inferred from read pairs. Our method can handle a mixture of multiple read sizes and multiple paired read distances. We present parallel algorithms for string graph construction, string graph compaction, graph based error detection and removal, and computing aggregate summarization of paired read links across graph edges. Using this, we navigate the final graph structure to reproduce large contiguous sequences from the underlying genome. We present a validation of our framework on experimental and simulated data from multiple known genomes and present scaling results on IBM Blue Gene/L.
机译:高通量短读技术的出现通过提供一种廉价的高覆盖率基因组测序方法,正在彻底改变生命科学。利用这项技术需要开发从头读取的汇编程序,这是一个重要的开放性问题,需要大量的研究工作。当前的方法主要限于微生物,其基因组比复杂的哺乳动物和植物基因组小两到三个数量级。在本文中,我们介绍了并行de novo短读组装器的设计和开发,该组装器可以扩展到具有高覆盖率的大型基因组。我们的方法基于弦图公式。输入读段被映射到短路径,基因组被重建为由从读对中推断出的距离约束锚定的超路径。我们的方法可以处理多种读取大小和多种配对读取距离的混合体。我们提出了用于字符串图构建,字符串图压缩,基于图的错误检测和消除以及计算跨图边缘的配对读取链接的汇总的并行算法。使用此,我们导航最终的图结构以从基础基因组中复制出大的连续序列。我们对来自多个已知基因组的实验和模拟数据进行了框架验证,并在IBM Blue Gene / L上显示了缩放结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号