首页> 外文期刊>ACM transactions on mathematical software >Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies
【24h】

Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies

机译:在数TB的数据上计算千万亿次浮点运算:全基因组关联研究

获取原文
获取原文并翻译 | 示例

摘要

In many scientific and engineering applications, one has to solve not one but multiple instances of the same problem. Often times, these problems are linked in a way that allows intermediate results to be reused. A characteristic example for this class of applications is given by the Genome-Wide Association Studies (GWAS), a widely spread tool in computational biology. GWAS entails the solution of up to trillions (10~(12)) of correlated generalized least-squares problems, posing a daunting challenge: the performance of petaflops (10~(15) floating-point operations) over terabytes (10~(12) bytes) of data. In this article, we design an algorithm for performing GWAS on multicore architectures. This is accomplished in three steps. First, we show how to exploit the relation among successive problems, thus reducing the overall computational complexity. Then, through an analysis of the required data transfers, we identify how to eliminate any overhead due to input/output operations. Finally, we study how to decompose computation into tasks to be distributed among the available cores, to attain high performance and scalability. With our algorithm, a GWAS that currently requires the use of a supercomputer may now be performed in matter of hours on a single multicore node. The discussion centers around the methodology to develop the algorithm rather than the specific application. We believe this article contributes valuable guidelines of general applicability for computational scientists on how to develop and optimize numerical algorithms.
机译:在许多科学和工程应用中,必须解决一个问题而不是多个实例。通常,这些问题以允许重用中间结果的方式链接在一起。这种类别的应用程序的一个典型例子是基因组广泛关联研究(GWAS),它是计算生物学中广泛使用的工具。 GWAS需要解决多达万亿(10〜(12))个相关的广义最小二乘问题,这提出了一个艰巨的挑战:petaflops(10〜(15)浮点运算)的性能超过了TB(10〜(12) )字节)的数据。在本文中,我们设计了一种在多核体系结构上执行GWAS的算法。这可以通过三个步骤完成。首先,我们展示如何利用连续问题之间的关系,从而降低总体计算复杂度。然后,通过分析所需的数据传输,我们确定如何消除由于输入/输出操作引起的任何开销。最后,我们研究如何将计算分解为任务以在可用内核之间分配,以实现高性能和可伸缩性。使用我们的算法,当前可能需要使用超级计算机的GWAS现在可以在单个多核节点上花费数小时来执行。讨论围绕开发算法的方法论而不是特定的应用程序。我们相信本文为计算科学家如何开发和优化数值算法提供了有价值的通用指南。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号