GPU-accelerated string matching for database applications

Sitaridi Evangelia A.; Ross Kenneth A.

首页> 外文期刊>The VLDB journal >GPU-accelerated string matching for database applications

【24h】

GPU-accelerated string matching for database applications

机译：GPU加速的字符串匹配，适用于数据库应用程序

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Implementations of relational operators on GPU processors have resulted in order of magnitude speedups compared to their multicore CPU counterparts. Here we focus on the efficient implementation of string matching operators common in SQL queries. Due to different architectural features the optimal algorithm for CPUs might be suboptimal for GPUs. GPUs achieve high memory bandwidth by running thousands of threads, so it is not feasible to keep the working set of all threads in the cache in a naive implementation. In GPUs the unit of execution is a group of threads and in the presence of loops and branches, threads in a group have to follow the same execution path; if some threads diverge, then different paths are serialized. We study the cache memory efficiency of single- and multi-pattern string matching algorithms for conventional and pivoted string layouts in the GPU memory. We evaluate the memory efficiency in terms of memory access pattern and achieved memory bandwidth for different parallelization methods. To reduce thread divergence, we split string matching into multiple steps. We evaluate the different matching algorithms in terms of average- and worst-case performance and compare them against state-of-the-art CPU and GPU libraries. Our experimental evaluation shows that thread and memory efficiency affect performance significantly and that our proposed methods outperform previous CPU and GPU algorithms in terms of raw performance and power efficiency. The Knuth-Morris-Pratt algorithm is a good choice for GPUs because its regular memory access pattern makes it amenable to several GPU optimizations.

机译：与多核CPU同类产品相比，关系处理器在GPU处理器上的实现已实现了数量级的加速。在这里，我们重点介绍SQL查询中常见的字符串匹配运算符的有效实现。由于不同的架构功能，CPU的最佳算法可能不适用于GPU。 GPU通过运行数千个线程来实现高内存带宽，因此在幼稚的实现中将所有线程的工作集保留在缓存中是不可行的。在GPU中，执行单元是一组线程，并且在存在循环和分支的情况下，一组线程必须遵循相同的执行路径。如果某些线程发散，则将不同的路径序列化。我们研究了GPU内存中常规和枢轴化字符串布局的单模式和多模式字符串匹配算法的高速缓存存储效率。我们根据内存访问模式评估内存效率，并针对不同的并行化方法评估内存带宽。为了减少线程分歧，我们将字符串匹配分为多个步骤。我们根据平均性能和最坏情况的性能来评估不同的匹配算法，并将它们与最新的CPU和GPU库进行比较。我们的实验评估表明，线程和内存效率会显着影响性能，并且在原始性能和功耗效率方面，我们提出的方法优于以前的CPU和GPU算法。 Knuth-Morris-Pratt算法是GPU的不错选择，因为它的常规内存访问模式使其可以进行多种GPU优化。

著录项

来源
《The VLDB journal》 |2016年第5期|719-740|共22页
作者
Sitaridi Evangelia A.; Ross Kenneth A.;
展开▼
作者单位

Columbia Univ, Dept Comp Sci, 1214 Amsterdam Ave, New York, NY 10027 USA;

Columbia Univ, Dept Comp Sci, 1214 Amsterdam Ave, New York, NY 10027 USA;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Text queries; String matching; GPU Processing; Thread divergence; Cache efficiency;

机译：文本查询;字符串匹配;GPU处理;线程分歧;缓存效率;
入库时间 2022-08-17 13:21:10

相似文献

外文文献
中文文献
专利

1. An Optimal Algorithm for Matching String Patterns in Large Text Databases [J] . K.s.m.v.Kumar, S.Viswanadha Raju, KA.Govardha International journal of computer science and network security . 2013,第6期

机译：大型文本数据库中匹配字符串模式的最佳算法
2. An Optimal Algorithm for Matching String Patterns in Large Text Databases [J] . K.s.m.v.Kumar, S.Viswanadha Raju, KA.Govardha International journal of computer science and network security . 2013,第6期

机译：大型文本数据库中匹配字符串模式的最佳算法
3. An Artificial Neural Network Based Approach for Online String Matching/Filtering of Large Databases [J] . Tatiana Tambouratzis International journal of entelligent systems . 2010,第4期

机译：基于人工神经网络的大型数据库在线字符串匹配/过滤方法
4. A Privacy-Preserving Multi-Pattern Matching Scheme for Searching Strings in Cloud Database [C] . Meiqi He, Jun Zhang, Gongxian Zeng, Annual Conference on Privacy, Security and Trust . 2017

机译：一种用于云数据库中字符串搜索的保护隐私的多模式匹配方案
5. Multi-filter String Matching and Human-centric Entity Matching for Information Extraction. [D] . Sun, Chong. 2012

机译：用于信息提取的多过滤器字符串匹配和以人为中心的实体匹配。
6. Next generation sequencing (NGS) database for tandem repeats with multiple pattern 2°-shaft multicore string matching [O] . Chinta Someswara Rao, S. Viswanadha Raju 2016

机译：下一代测序（NGS）数据库具有多个模式2°轴多核字符串匹配的串联重复序列
7. Supporting Uncertain Predicates in DBMS Using Approximate String Matching and Probabilistic Databases [O] . Amol S. Jumde, Ravindra B. Keskar 2020

机译：使用近似字符串匹配和概率数据库支持DBMS中不确定的谓词
8. Tree Matching Problems with Applications to Structured Text Databases [R] . Kilpelaeinen, P. 1992

机译：树形匹配问题及其在结构化文本数据库中的应用

GPU-accelerated string matching for database applications

摘要

著录项

相似文献

相关主题

期刊订阅