首页> 外文期刊>Parallel Computing >Exploring efficient data parallelism for genome read mapping on multicore and manycore architectures
【24h】

Exploring efficient data parallelism for genome read mapping on multicore and manycore architectures

机译:探索在多核和多核架构上进行基因组读取映射的有效数据并行性

获取原文
获取原文并翻译 | 示例

摘要

Nowadays heterogeneous architectures formed by multicore and manycore systems have become attractive solutions to cope with the data booming in genomic-based studies. Our work explores the efficient usage of heterogeneous architectures in such area. In particular, we have studied the use of manycore components like the Xeon Phi accelerator, which has proved to be a convenient choice because it allows an easy migration of applications developed for multicore servers based on the x 86 architecture. Our study also focuses on the problem of sequence alignment, which is one of the fundamental and most costly computational stages in most genome variant studies. We concentrate our attention on BWA, one of the most popular sequence aligners, and we have focused our attention on three types of heterogeneous systems, one containing Intel multi-core CPUs and accelerators, one that are made up of several multi-core servers, and one large-scale system. Each with different characteristics in terms of number of CPUs, number of cores and system organization memory. Although the problem of alignment of sequences fits in the embarrassingly parallel pattern, achieving good performance and good scalability in heterogeneous environments can be complex. We have analyzed different strategies based on the distribution of data and the replication of certain data structures and we found that MDPR (Multi-level Data Parallelization and Replication) strategy has shown the best results in all the heterogeneous platforms tested. Its results have surpassed other strategies proposed in the literature and have shown its malleability to be used in different heterogeneous environments without the need to apply specific adjustments according to the underlying architecture. In the design of MDPR, different static and dynamic data distribution strategies have also been evaluated. The best results were obtained by the static strategy, which has a significant preprocessing cost. However, the dynamic strategy of data distribution using a round-robin mechanism obtained similar times without the need for the preprocessing stage. Although our proposal was applied to BWA using human genome data samples, this strategy can be easily applied to other sequence datasets and alignment tools that have similar operating principles with those of BWA aligner. (C) 2019 The Authors. Published by Elsevier B.V.
机译:如今,由多核和多核系统形成的异构体系结构已成为解决基于基因组的研究中的数据激增的有吸引力的解决方案。我们的工作探索了该领域中异构架构的有效利用。特别是,我们研究了诸如Xeon Phi加速器之类的许多核心组件的使用,事实证明这是一个方便的选择,因为它可以轻松迁移为基于x 86架构的多核服务器开发的应用程序。我们的研究还关注序列比对的问题,这是大多数基因组变异研究中最基本,最昂贵的计算阶段之一。我们将注意力集中在最流行的序列比对器之一BWA上,并将注意力集中在三种类型的异构系统上,一种包含Intel多核CPU和加速器,一种由几台多核服务器组成,和一个大型系统。每个都在CPU数量,内核数量和系统组织内存方面具有不同的特征。尽管序列比对的问题适合于令人尴尬的并行模式,但是在异构环境中实现良好的性能和良好的可伸缩性可能很复杂。我们已经基于数据的分布和某些数据结构的复制分析了不同的策略,并且发现MDPR(多级数据并行化和复制)策略在所有测试的异构平台中均显示了最佳结果。其结果已经超过了文献中提出的其他策略,并显示了其可延展性可用于不同的异构环境中,而无需根据基础架构进行特定的调整。在MDPR的设计中,还评估了不同的静态和动态数据分发策略。通过静态策略可获得最佳结果,该静态策略具有巨大的预处理成本。但是,使用循环机制进行数据分发的动态策略无需进行预处理即可获得相似的时间。尽管我们的建议是使用人类基因组数据样本应用于BWA,但该策略可以轻松应用于与BWA aligner具有相似操作原理的其他序列数据集和比对工具。 (C)2019作者。由Elsevier B.V.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号