首页> 外文期刊>Parallel Computing >Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi
【24h】

Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi

机译:位并行近似模式匹配:开普勒GPU与至强融核

获取原文
获取原文并翻译 | 示例
           

摘要

Approximate pattern matching (APM) targets to find the occurrences of a pattern inside a subject text allowing a limited number of errors. It has been widely used in many application areas such as bioinformatics and information retrieval. Bit-parallel APM takes advantage of the intrinsic parallelism of bitwise operations inside a machine word. This approach typically encodes non-deterministic finite automaton (NFA) states or value differences between adjacent cells of a dynamic programming matrix in the form of bit arrays. Wu-Manber (WM) is a well-known bit-parallel APM algorithm, which simulates an NFA and gains parallel efficiency by performing multiple state updates within a machine word. An important parameter is the machine word size (e.g. 32 or 64 bits for CPUs). Due to increasing vector capabilities, efficient mapping of bit-parallel APM algorithms onto modern high performance computing architectures is an interesting research topic. Prominent examples are Xeon Phi coprocessors and CUDA-enabled GPUs, which provide words of size 512 bits (by means of vector registers) and 1024 bits (by means of warps), respectively. In this paper, we investigate mappings of the WM algorithm onto these two accelerator types. Both architectures are able to achieve around two orders-of-magnitude speedups compared to a single-threaded CPU implementation. Moreover, our tile-based implementation on a GeForce Titan graphics card runs up to 2.9 x faster than our implementation on an Intel Xeon Phi 5110P. Source code is available at http://xbitpar.sourceforge.net. (C) 2015 Elsevier B.V. All rights reserved.
机译:近似模式匹配(APM)的目标是在允许有限数量错误的主题文本中查找模式的出现。它已被广泛应用于许多领域,例如生物信息学和信息检索。位并行APM利用了机器字内部按位运算的固有并行性。这种方法通常以位阵列的形式编码非确定性有限自动机(NFA)状态或动态编程矩阵的相邻单元之间的值差。 Wu-Manber(WM)是一种著名的位并行APM算法,它模拟NFA并通过在一个机器字内执行多个状态更新来获得并行效率。一个重要的参数是机器字的大小(例如CPU的32或64位)。由于矢量功能的增强,将位并行APM算法有效映射到现代高性能计算体系结构是一个有趣的研究主题。 Xeon Phi协处理器和支持CUDA的GPU是突出的例子,它们分别提供大小为512位(通过向量寄存器)和1024位(通过warp)的字。在本文中,我们研究了WM算法在这两种加速器类型上的映射。与单线程CPU实施相比,这两种架构都可以实现大约两个数量级的加速。此外,与在Intel Xeon Phi 5110P上的实现相比,我们在GeForce Titan显卡上基于图块的实现运行速度快2.9倍。源代码可从http://xbitpar.sourceforge.net获得。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号