首页> 外文期刊>Parallel Computing >Exploring efficient data parallelism for genome read mapping on multicore and manycore architectures
【24h】

Exploring efficient data parallelism for genome read mapping on multicore and manycore architectures

机译:在多核和多核架构上探索基因组读取映射的有效数据并行性

获取原文
获取原文并翻译 | 示例

摘要

Nowadays heterogeneous architectures formed by multicore and manycore systems have become attractive solutions to cope with the data booming in genomic-based studies. Our work explores the efficient usage of heterogeneous architectures in such area. In particular, we have studied the use of manycore components like the Xeon Phi accelerator, which has proved to be a convenient choice because it allows an easy migration of applications developed for multicore servers based on the x 86 architecture. Our study also focuses on the problem of sequence alignment, which is one of the fundamental and most costly computational stages in most genome variant studies. We concentrate our attention on BWA, one of the most popular sequence aligners, and we have focused our attention on three types of heterogeneous systems, one containing Intel multi-core CPUs and accelerators, one that are made up of several multi-core servers, and one large-scale system. Each with different characteristics in terms of number of CPUs, number of cores and system organization memory. Although the problem of alignment of sequences fits in the embarrassingly parallel pattern, achieving good performance and good scalability in heterogeneous environments can be complex. We have analyzed different strategies based on the distribution of data and the replication of certain data structures and we found that MDPR (Multi-level Data Parallelization and Replication) strategy has shown the best results in all the heterogeneous platforms tested. Its results have surpassed other strategies proposed in the literature and have shown its malleability to be used in different heterogeneous environments without the need to apply specific adjustments according to the underlying architecture. In the design of MDPR, different static and dynamic data distribution strategies have also been evaluated. The best results were obtained by the static strategy, which has a significant preprocessing cost. However, the dynamic strategy of data distribution using a round-robin mechanism obtained similar times without the need for the preprocessing stage. Although our proposal was applied to BWA using human genome data samples, this strategy can be easily applied to other sequence datasets and alignment tools that have similar operating principles with those of BWA aligner. (C) 2019 The Authors. Published by Elsevier B.V.
机译:如今,多核和多核系统形成的异构架构已经成为应对基因组研究中的数据蓬勃发展的有吸引力的解决方案。我们的工作探讨了这种区域中异构架构的有效使用。特别是,我们研究了像Xeon Phi加速器这样的多核组件的使用,这已经被证明是一种方便的选择,因为它允许基于x 86架构为多核服务器开发的应用程序轻松迁移。我们的研究还侧重于序列对齐问题,这是大多数基因组变异研究中的基本和最昂贵的计算阶段之一。我们将注意力集中在BWA上,是最受欢迎的序列对齐器之一,我们将注意力集中在三种类型的异构系统上,其中一个包含英特尔多核CPU和加速器,其中一个多核服务器组成,和一个大规模的系统。每个具有不同特征的CPU数量,核心数量和系统组织内存。虽然序列的对准问题适合令人尴尬的平行模式,但在异构环境中实现了良好的性能和良好的可扩展性可以复杂。我们根据数据分发和某些数据结构的复制分析了不同的策略,并发现MDPR(多级数据并行化和复制)策略显示了所测试的所有异构平台中的最佳结果。其结果超越了文献中提出的其他策略,并显示出其在不同的异构环境中使用的磁带性,而无需根据潜在的架构进行具体调整。在MDPR的设计中,还评估了不同的静态和动态数据分布策略。最佳结果是通过静态策略获得,具有显着的预处理成本。然而,使用循环机制的数据分布的动态策略在不需要预处理阶段获得类似的时间。虽然我们的提案应用于BWA使用人类基因组数据样本,但是这种策略可以很容易地应用于具有与BWA对准器类似的操作原理的其他序列数据集和对准工具。 (c)2019年作者。由elsevier b.v出版。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号