Efficient Regular Expression Matching on Compressed Strings

机译：压缩字符串上的有效正则表达式匹配

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Existing methods for regular expression matching on LZ78 compressed strings do not perform efficiently. Moreover, LZ78 compression has some shortcomings, such as high compression ratio and slower decompression speed than LZ77 (a variant of LZ78). In this paper, we study regular expression matching on LZ77 compressed strings. To address this problem, we propose an efficient algorithm, namely, RELZ, utilizing the positive factors, i.e., a prefix and a suffix, and negative factors (Negative factors are substrings that cannot appear in an answer.) of the regular expression to prune the candidates. For the sake of quickly locating these two kinds of factors on the compressed string without decompression, we design a variant suffix trie index, called SSLZ. In addition, we construct bitmaps for factors of regular expression to detect potential region and propose block filtering to reduce candidates. At last, we conduct a comprehensive performance evaluation using five real datasets to validate our ideas and the proposed algorithms. The experimental result shows that our RELZ algorithm outperforms the existing algorithms significantly.

机译：现有的用于LZ78压缩字符串的正则表达式匹配的方法无法有效执行。而且，LZ78压缩比LZ77（LZ78的一种变体）具有较高的压缩比和较低的解压缩速度等缺点。在本文中，我们研究了LZ77压缩字符串上的正则表达式匹配。为了解决这个问题，我们提出了一种有效的算法，即RELZ，它利用正则表达式（例如前缀和后缀）和负因素（负因素是不能出现在答案中的子字符串）来修剪正则表达式。候选人。为了在不进行解压缩的情况下快速定位压缩字符串上的这两种因素，我们设计了一个变体后缀特里索引，称为SSLZ。此外，我们为正则表达式的因素构造位图以检测潜在区域，并提出块过滤以减少候选对象。最后，我们使用五个真实的数据集进行了全面的性能评估，以验证我们的想法和提出的算法。实验结果表明，我们的RELZ算法明显优于现有算法。

著录项

来源
《International conference on database systems for advanced applications》|2017年|219-234|共16页
会议地点
作者
Yutong Han; Bin Wang; Xiaochun Yang; Huaijie Zhu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Regular expression; LZ77; String matching; Self-index;

机译：正则表达式; LZ77;字符串匹配;自我索引;

相似文献

外文文献
中文文献
专利

1. Efficient regular expression matching on LZ77 compressed strings using negative factors [J] . Han Yutong, Wang Bin, Yang Xiaochun, World Wide Web . 2019,第6期

机译：使用负因子对LZ77压缩字符串进行有效的正则表达式匹配
2. Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts [J] . Bille P., Fagerberg R., G?rtz I.L. ACM transactions on algorithms . 2010,第1期

机译：在Ziv-Lempel压缩文本上改进了近似字符串匹配和正则表达式匹配
3. Efficient regular expression matching over compressed traffic [J] . Computer networks . 2020,第Feba26期

机译：高效的正则表达式匹配压缩流量
4. Efficient Regular Expression Matching on Compressed Strings [C] . Yutong Han, Bin Wang, Xiaochun Yang, International conference on database systems for advanced applications . 2017

机译：在压缩字符串上匹配的高效正则表达式
5. Beyond regular: Pattern matching with extended regular expressions. [D] . Carle, Benjamin. 2010

机译：超越正则：与扩展正则表达式匹配的模式。
6. Exploring efficient grouping algorithms in regular expression matching [O] . Chengcheng Xu, Jinshu Su, Shuhui Chen 2012

机译：在正则表达式匹配中探索有效的分组算法
7. Improved approximate string matching and regular expression matching on ziv-lempel compressed texts [O] . Philip Bille, Rolf Fagerberg, Inge Li Gørtz 2013

机译：改进了ziv-lempel压缩文本的近似字符串匹配和正则表达式匹配
8. Algorithms for Finding an Optimal Matching Between a Given String and a StringGenerated by a Regular Grammar [R] . Baas, S. M., Vanschaik, P. 1990

机译：寻找给定字符串与由常规语法生成的string之间的最佳匹配的算法

Efficient Regular Expression Matching on Compressed Strings

摘要

著录项

相似文献

相关主题

期刊订阅