Quantifying and Mitigating Computational Inefficiency of Genomics Data Analysis

机译：量化和缓解基因组数据分析的计算效率低下

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we performed a comprehensive study of quantifying and mitigating computational inefficiency of current genomic analysis approaches. First, we found current parallelization approaches that have limited scalability due to either unexploited parallelism or low utilization of system resource. Thus, we proposed Spark-Gene, which is on the basis of Spark in-memory programming model. To test the performance of our Spark-Gene, we used WGS in the GATK as the test case. We show that Spark-Gene reduces the execution time of WGS analysis from 19 hours to 30 minutes with a speedup in excess of 37-fold at 256 CPU cores. Furthermore, we identified that garbage collection is the major scalable bottleneck of better parallel efficiency for native in-memory computing model. Second, we quantified microarchitectural inefficiency for typical genomic applications and uncovered opportunities for microarchitectural optimizations for the design of genomic domain-specific accelerator, especially on specializing concurrency, computation and memory hierarchy. This paper is to leverage state-of-art big-data technologies to improve parallelization for genomics analysis and motivate the integration of accelerators into the genomic analysis computing system.

机译：在本文中，我们对量化和减轻当前基因组分析方法的计算效率进行了全面的研究。首先，我们发现当前的并行化方法由于未利用的并行性或系统资源利用率低而具有有限的可伸缩性。因此，我们提出了基于Spark内存编程模型的Spark-Gene。为了测试我们的Spark-Gene的性能，我们在GATK中使用了WGS作为测试用例。我们证明，Spark-Gene将WGS分析的执行时间从19小时减少到30分钟，在256个CPU内核上的加速超过了37倍。此外，我们发现垃圾收集是本地内存计算模型具有更好的并行效率的主要可扩展瓶颈。其次，我们量化了典型基因组应用的微体系结构效率低下，并发现了针对特定基因组域加速器设计的微体系结构优化的机会，特别是在并发，计算和内存层次结构方面。本文将利用最先进的大数据技术来改善基因组分析的并行化，并促进将加速器集成到基因组分析计算系统中。

著录项

来源
《IEEE International Conference on High Performance Computing and Communications;IEEE International Conference on Smart City;IEEE International Conference on Data Science and Systems》|2017年|262-269|共8页
会议地点
作者
Xueqi Li; Guangming Tan; Chunming Zhang; Xu Li; Zhonghai Zhang; Ninghui Sun;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Genomics; Bioinformatics; Pipelines; Sequential analysis; Microarchitecture; Cancer; Tools;

机译：基因组学;生物信息学;管道;顺序分析;微体系结构;癌症;工具;

相似文献

外文文献
中文文献
专利

1. MitiGate; an online meta-analysis database for quantification of mitigation strategies for enteric methane emissions [J] . Jolien B. Veneman, Eli R. Saetnan, Amanda J. Clare, The Science of the Total Environment . 2016,第deca1期

机译：减轻;在线荟萃分析数据库，用于量化肠甲烷排放的缓解策略
2. Phylodynamics with Migration: A Computational Framework to Quantify Population Structure from Genomic Data [J] . Kuhnert Denise, Stadler Tanja, Vaughan Timothy G., Molecular biology and evolution . 2016,第8期

机译：带有迁移的系统动力学：从基因组数据量化种群结构的计算框架
3. Quantifying and mitigating inefficiency in information acquisition under competition [J] . Li Jialu, Yang Meiying, Zhao Xuan Central European journal of operations research: CEJOR . 2019,第4期

机译：在竞争下量化和减轻信息收购的低效率
4. Quantifying and Mitigating Computational Inefficiency of Genomics Data Analysis [C] . Xueqi Li, Guangming Tan, Chunming Zhang, IEEE International Conference on High Performance Computing and Communications . 2017

机译：量化和减轻基因组学数据分析的计算效率
5. Computational methods for the analysis of high throughput genomic data in cancer and development. [D] . Pankov, Aleksandr. 2016

机译：用于分析癌症和发育中高通量基因组数据的计算方法。
6. Phylodynamics with Migration: A Computational Framework to Quantify Population Structure from Genomic Data [O] . Denise Kühnert, Tanja Stadler, Timothy G. Vaughan, -1

机译：带有迁移的系统动力学：从基因组数据量化种群结构的计算框架
7. MitiGate; an online meta-analysis database for quantification of mitigation strategies for enteric methane emissions [O] . Veneman Jolien Bernadet, Saetnan Eli, Clare Amanda, 9

机译：减轻;一个在线元分析数据库，用于量化肠道甲烷排放的缓解策略

Quantifying and Mitigating Computational Inefficiency of Genomics Data Analysis

摘要

著录项

相似文献

相关主题

期刊订阅