【24h】

Quantifying and Mitigating Computational Inefficiency of Genomics Data Analysis

机译:量化和缓解基因组数据分析的计算效率低下

获取原文

摘要

In this paper, we performed a comprehensive study of quantifying and mitigating computational inefficiency of current genomic analysis approaches. First, we found current parallelization approaches that have limited scalability due to either unexploited parallelism or low utilization of system resource. Thus, we proposed Spark-Gene, which is on the basis of Spark in-memory programming model. To test the performance of our Spark-Gene, we used WGS in the GATK as the test case. We show that Spark-Gene reduces the execution time of WGS analysis from 19 hours to 30 minutes with a speedup in excess of 37-fold at 256 CPU cores. Furthermore, we identified that garbage collection is the major scalable bottleneck of better parallel efficiency for native in-memory computing model. Second, we quantified microarchitectural inefficiency for typical genomic applications and uncovered opportunities for microarchitectural optimizations for the design of genomic domain-specific accelerator, especially on specializing concurrency, computation and memory hierarchy. This paper is to leverage state-of-art big-data technologies to improve parallelization for genomics analysis and motivate the integration of accelerators into the genomic analysis computing system.
机译:在本文中,我们对量化和减轻当前基因组分析方法的计算效率进行了全面的研究。首先,我们发现当前的并行化方法由于未利用的并行性或系统资源利用率低而具有有限的可伸缩性。因此,我们提出了基于Spark内存编程模型的Spark-Gene。为了测试我们的Spark-Gene的性能,我们在GATK中使用了WGS作为测试用例。我们证明,Spark-Gene将WGS分析的执行时间从19小时减少到30分钟,在256个CPU内核上的加速超过了37倍。此外,我们发现垃圾收集是本地内存计算模型具有更好的并行效率的主要可扩展瓶颈。其次,我们量化了典型基因组应用的微体系结构效率低下,并发现了针对特定基因组域加速器设计的微体系结构优化的机会,特别是在并发,计算和内存层次结构方面。本文将利用最先进的大数据技术来改善基因组分析的并行化,并促进将加速器集成到基因组分析计算系统中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号