【24h】

Indexing Genomic Sequences on the IBM Blue Gene

机译:在IBM Blue基因上索引基因组序列

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

With advances in sequencing technology and through aggressive sequencing efforts, DNA sequence data sets have been growing at a rapid pace. To gain from these advances, it is important to provide life science researchers with the ability to process and query large sequence data sets. For the past three decades, the suffix tree has served as a fundamental data structure in processing sequential data sets. However, tree construction times on large data sets have been excessive. While parallel suffix tree construction is an obvious solution to reduce execution times, poor locality of reference has limited parallel performance. In this paper, we show that through careful parallel algorithm design, this limitation can be removed, allowing tree construction to scale to massively parallel systems like the IBM Blue Gene. We demonstrate that the entire Human genome can be indexed on 1024 processors in under 15 minutes.
机译:随着测序技术的进步以及积极的测序工作,DNA序列数据集正在快速增长。为了从这些进步中获益,向生命科学研究者提供处理和查询大型序列数据集的能力非常重要。在过去的三十年中,后缀树已成为处理顺序数据集的基本数据结构。但是,大型数据集上树的构建时间过长。虽然并行后缀树构造是减少执行时间的明显解决方案,但较差的引用局部性却限制了并行性能。在本文中,我们表明通过精心设计的并行算法,可以消除此限制,从而使树的构建可以扩展到大规模并行系统(如IBM Blue Gene)。我们证明了整个人类基因组可以在15分钟内在1024个处理器上建立索引。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号