首页> 外文会议>International conference on database systems for advanced applications >Efficient Regular Expression Matching on Compressed Strings
【24h】

Efficient Regular Expression Matching on Compressed Strings

机译:压缩字符串上的有效正则表达式匹配

获取原文

摘要

Existing methods for regular expression matching on LZ78 compressed strings do not perform efficiently. Moreover, LZ78 compression has some shortcomings, such as high compression ratio and slower decompression speed than LZ77 (a variant of LZ78). In this paper, we study regular expression matching on LZ77 compressed strings. To address this problem, we propose an efficient algorithm, namely, RELZ, utilizing the positive factors, i.e., a prefix and a suffix, and negative factors (Negative factors are substrings that cannot appear in an answer.) of the regular expression to prune the candidates. For the sake of quickly locating these two kinds of factors on the compressed string without decompression, we design a variant suffix trie index, called SSLZ. In addition, we construct bitmaps for factors of regular expression to detect potential region and propose block filtering to reduce candidates. At last, we conduct a comprehensive performance evaluation using five real datasets to validate our ideas and the proposed algorithms. The experimental result shows that our RELZ algorithm outperforms the existing algorithms significantly.
机译:现有的用于LZ78压缩字符串的正则表达式匹配的方法无法有效执行。而且,LZ78压缩比LZ77(LZ78的一种变体)具有较高的压缩比和较低的解压缩速度等缺点。在本文中,我们研究了LZ77压缩字符串上的正则表达式匹配。为了解决这个问题,我们提出了一种有效的算法,即RELZ,它利用正则表达式(例如前缀和后缀)和负因素(负因素是不能出现在答案中的子字符串)来修剪正则表达式。候选人。为了在不进行解压缩的情况下快速定位压缩字符串上的这两种因素,我们设计了一个变体后缀特里索引,称为SSLZ。此外,我们为正则表达式的因素构造位图以检测潜在区域,并提出块过滤以减少候选对象。最后,我们使用五个真实的数据集进行了全面的性能评估,以验证我们的想法和提出的算法。实验结果表明,我们的RELZ算法明显优于现有算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号