首页> 外文期刊>The VLDB journal >GPU-accelerated string matching for database applications
【24h】

GPU-accelerated string matching for database applications

机译:GPU加速的字符串匹配,适用于数据库应用程序

获取原文
获取原文并翻译 | 示例
       

摘要

Implementations of relational operators on GPU processors have resulted in order of magnitude speedups compared to their multicore CPU counterparts. Here we focus on the efficient implementation of string matching operators common in SQL queries. Due to different architectural features the optimal algorithm for CPUs might be suboptimal for GPUs. GPUs achieve high memory bandwidth by running thousands of threads, so it is not feasible to keep the working set of all threads in the cache in a naive implementation. In GPUs the unit of execution is a group of threads and in the presence of loops and branches, threads in a group have to follow the same execution path; if some threads diverge, then different paths are serialized. We study the cache memory efficiency of single- and multi-pattern string matching algorithms for conventional and pivoted string layouts in the GPU memory. We evaluate the memory efficiency in terms of memory access pattern and achieved memory bandwidth for different parallelization methods. To reduce thread divergence, we split string matching into multiple steps. We evaluate the different matching algorithms in terms of average- and worst-case performance and compare them against state-of-the-art CPU and GPU libraries. Our experimental evaluation shows that thread and memory efficiency affect performance significantly and that our proposed methods outperform previous CPU and GPU algorithms in terms of raw performance and power efficiency. The Knuth-Morris-Pratt algorithm is a good choice for GPUs because its regular memory access pattern makes it amenable to several GPU optimizations.
机译:与多核CPU同类产品相比,关系处理器在GPU处理器上的实现已实现了数量级的加速。在这里,我们重点介绍SQL查询中常见的字符串匹配运算符的有效实现。由于不同的架构功能,CPU的最佳算法可能不适用于GPU。 GPU通过运行数千个线程来实现高内存带宽,因此在幼稚的实现中将所有线程的工作集保留在缓存中是不可行的。在GPU中,执行单元是一组线程,并且在存在循环和分支的情况下,一组线程必须遵循相同的执行路径。如果某些线程发散,则将不同的路径序列化。我们研究了GPU内存中常规和枢轴化字符串布局的单模式和多模式字符串匹配算法的高速缓存存储效率。我们根据内存访问模式评估内存效率,并针对不同的并行化方法评估内存带宽。为了减少线程分歧,我们将字符串匹配分为多个步骤。我们根据平均性能和最坏情况的性能来评估不同的匹配算法,并将它们与最新的CPU和GPU库进行比较。我们的实验评估表明,线程和内存效率会显着影响性能,并且在原始性能和功耗效率方面,我们提出的方法优于以前的CPU和GPU算法。 Knuth-Morris-Pratt算法是GPU的不错选择,因为它的常规内存访问模式使其可以进行多种GPU优化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号