首页> 外文会议>International Conference on High Performance Computing Simulation >Burrows-Wheeler Transform based indexed exact search on a multi-GPU OpenCL platform
【24h】

Burrows-Wheeler Transform based indexed exact search on a multi-GPU OpenCL platform

机译:Burrows-Wheeler在多GPU OpenCL平台上基于索引精确搜索

获取原文

摘要

A multi-GPU parallelization of exact string matching algorithms based on the backward-search procedure by using indexing techniques, such as the Burrows-Wheeler Transform and the FM-Index, is proposed in this paper. To attain an efficient execution on highly heterogeneous parallel platforms, the proposed parallelization adopted an unified OpenCL implementation that allows its execution either in CPUs and in multiple and possibly different GPU devices (e.g., NVIDIA and AMD GPUs) that integrate the targeted platform. Furthermore, the proposed implementation incorporates convenient load-balancing techniques, in order to ensure not only a convenient balance of the involved workload to minimize the resulting processing time, but also the possibility to scale the offered throughput with the number of exploited GPUs. The obtained experimental results showed that the proposed multi-GPU parallelization platform is able to offer significant speedups (greater than 10×, when using one single GPU) when compared to conventional mainstream multi-threaded CPU implementations (Bowtie - 8 threads), and between 5× and 30× when compared to other popular BWT-based aligners, namely BWA and SOAP2, using their multi-threading options. When compared with state of the art GPU implementations (e.g., SOAP3, HPG-BWT, Barracuda and CUSHAW2-GPU), the proposed implementation showed to be able to provide speedups between 2.5× and 5×. The execution of the proposed alignment platform when considering multiple and completely distinct GPU devices demonstrated the ability to efficiently scale the resulting throughput, by offering a convenient load-balancing of the involved processing in the several distinct devices.
机译:本文提出了一种基于索引技术的基于向后搜索过程的精确串联算法的多GPU并行化,例如挖掘机轮车变换和FM指数。为了在高度异构的并相平台上实现有效的执行,所提出的并行化采用统一的OpenCL实现,允许其在CPU和多个以及可能不同的GPU设备中执行,该设备(例如,NVIDIA和AMD GPU)集成了目标平台。此外,所提出的实现包括方便的负载平衡技术,以确保不仅可以实现所涉及的工作负载的方便平衡,以最小化所产生的处理时间,而且可以使用利用GPU的数量来扩展所提供的吞吐量的可能性。将所得到的实验结果表明,(×,使用一个单个GPU当大于10),当相比于常规主流多线程CPU实现(蝴蝶结 - 8个线程)所提出的多GPU的并行平台能够提供显著的加速,和之间与其他基于BWT的对齐器相比,即使用其多线程选项的其他基于BWT的对齐器,即BWA和SOAP2相比,5×和30×。与现有技术的GPU实现(例如,SOAP3,HPG-BWT,Barracuda和Cushaw2-GPU相比)进行比较时,所提出的实现显示能够在2.5×和5×之间提供加速。在考虑多个和完全不同的GPU器件时,在考虑多个和完全不同的GPU设备时执行所提出的对齐平台的能力通过在几个不同设备中提供方便的负载平衡,通过提供所涉及的处理的方便的负载平衡来提供有效缩放所得到的吞吐量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号