GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

Srinivasa Krishnarajanagar G.; Shree Devi B. N.

首页> 外文期刊>Journal of The Institution of Engineers (India): Series B >GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

【24h】

GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

机译：基于GPU的带分数表方法的N-Gram字符串匹配算法，用于在许多文档中进行字符串搜索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

String searching in documents has become a tedious task with the evolution of Big Data. Generation of large data sets demand for a high performance search algorithm in areas such as text mining, information retrieval and many others. The popularity of GPU’s for general purpose computing has been increasing for various applications. Therefore it is of great interest to exploit the thread feature of a GPU to provide a high performance search algorithm. This paper proposes an optimized new approach to N-gram model for string search in a number of lengthy documents and its GPU implementation. The algorithm exploits GPGPUs for searching strings in many documents employing character level N-gram matching with parallel Score Table approach and search using CUDA API. The new approach of Score table used for frequency storage of N-grams in a document, makes the search independent of the document’s length and allows faster access to the frequency values, thus decreasing the search complexity. The extensive thread feature in a GPU has been exploited to enable parallel pre-processing of trigrams in a document for Score Table creation and parallel search in huge number of documents, thus speeding up the whole search process even for a large pattern size. Experiments were carried out for many documents of varied length and search strings from the standard Lorem Ipsum text on NVIDIA’s GeForce GT 540M GPU with 96 cores. Results prove that the parallel approach for Score Table creation and searching gives a good speed up than the same approach executed serially.

机译：随着大数据的发展，在文档中进行字符串搜索已成为一项繁琐的任务。大数据集的产生要求在文本挖掘，信息检索等许多领域需要高性能的搜索算法。对于各种应用，用于通用计算的GPU的普及程度正在不断提高。因此，利用GPU的线程功能来提供高性能的搜索算法非常重要。本文提出了一种优化的N-gram模型新方法，用于在许多冗长的文档中进行字符串搜索及其GPU实现。该算法利用GPGPU在许多文档中使用字符级N-gram匹配和并行计分表方法搜索字符串，并使用CUDA API进行搜索。分数表的新方法用于在文档中以N-gram的频率存储，使搜索独立于文档的长度，并允许更快地访问频率值，从而降低了搜索复杂度。利用GPU中广泛的线程功能，可以对文档中的三字母组进行并行预处理，以创建分数表并在大量文档中进行并行搜索，从而即使在模式较大的情况下也可以加快整个搜索过程。在NVIDIA具有96个核心的GeForce GT 540M GPU上，对来自Lorem Ipsum标准文本的各种长度不同的文档和搜索字符串进行了实验。结果证明，与串行执行相同的方法相比，并行创建和查找表的方法可以提供更快的速度。

著录项

来源
《Journal of The Institution of Engineers (India): Series B》 |2017年第5期|467-476|共10页
作者
Srinivasa Krishnarajanagar G.; Shree Devi B. N.;
展开▼
作者单位

Department of Computer Science and Engineering, M.S. Ramaiah Institute of Technology, Bengaluru, Karnataka, India;

Department of Computer Science and Engineering, M.S. Ramaiah Institute of Technology, Bengaluru, Karnataka, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
CUDA; GPGPU; N-gram; NVIDIA; Score table; String matching; String searching; Trigram;

机译：CUDA;GPGPU;N-克;NVIDIA;分数表;字符串匹配;字符串搜索;特里格拉姆;

相似文献

外文文献
中文文献
专利

1. A Memory-Access-Efficient Implementation for Computing the Approximate String Matching Algorithm on GPUs [J] . Lucas Saad Nogueira NUNES, Jacir Luiz BORDIM, Yasuaki ITO, IEICE transactions on information and systems . 2016,第12期

机译：一种在GPU上计算近似字符串匹配算法的内存访问有效实现
2. New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [J] . Ho ThienLuan, Oh Seung-Rohk, Kim HyunJin Journal of supercomputing . 2018,第5期

机译：海明距离下定长近似字符串匹配和近似圆字符串匹配的新算法
3. Correction to: New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance [J] . Ho ThienLuan, Oh Seung-Rohk, Kim HyunJin Journal of supercomputing . 2018,第5期

机译：更正为：在汉明距离下用于定长近似字符串匹配和近似圆形字符串匹配的新算法
4. Chinese Spelling Check based on N-gram and String Matching Algorithm [C] . Jui-Feng Yen, Li-Ting Chang, Chan-Yi Liu, Workshop on natural language processing techniques for educational applications . 2017

机译：基于N-gram和字符串匹配算法的中文拼写检查
5. Hardware Implementation of a String Matching Algorithm Based on the FM-Index [D] . Fernandez, Edward Bryann Cabanayan 2013

机译：基于FM-Index的字符串匹配算法的硬件实现
6. Entropy-Based Approach in Selection Exact String-Matching Algorithms [O] . Ivan Markić, Maja Štula, Marija Zorić, 2021

机译：基于熵的选择精确串匹配算法
7. Study on the Searching Mongolian Based on String Matching Algorithms [O] . 花菊 2013

机译：基于串匹配算法的搜索蒙古的研究

GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

摘要

著录项

相似文献

相关主题

期刊订阅