首页> 外文会议>IEEE International Conference on Adaptive Science and Technology >Efficient multi-word parameterized matching on compressed text
【24h】

Efficient multi-word parameterized matching on compressed text

机译:在压缩文本上有效的多字参数化匹配

获取原文

摘要

Searching set of patterns {P, P, P, ....P}, r≥1, inside body of a text T[1...n] is called multi-pattern matching problem. This matching is said to be parameterized match (p-match), if one can be transformed into the other via some bijective mapping. It is mainly used in software maintenance, plagiarism detection and detecting isomorphism in a graph. In the compressed parameterized matching problem, our task is to find all the parameterized occurrences of a pattern (set of patterns) in the compressed text, without decompressing it. Compressing the text before matching reduces the size and minimizes the matching time also. In this paper, we develop an efficient algorithm for parameterized multi-word matching problem on the compressed text, where both patterns and text are compressed before actual matching is performed and pattern is treated as word. For compressing the pattern and text, we use efficient compression code: Word Based Tagged Code (WBTC) and bit-parallel algorithm is used for searching purpose. Experimental results show that our algorithm is up to three times faster than the search on the uncompressed text.
机译:搜索模式{p,p,p,.... p},r≥1,文本t [1 ... n]的身体内部称为多模式匹配问题。据说该匹配是参数化匹配(P-Match),如果可以通过一些基辅映映射将另一个转换为另一个匹配。它主要用于软件维护,抄袭检测和检测图中的同构。在压缩的参数化匹配问题中,我们的任务是在压缩文本中找到模式(图案集)的所有参数化发生,而不会解压缩它。在匹配之前压缩文本减少了大小并最大限度地减少匹配时间。在本文中,我们在压缩文本上开发了一种有效的参数化多字匹配问题的算法,其中在执行实际匹配之前压缩了两种模式和文本,并且将模式被视为Word。为了压缩模式和文本,我们使用有效的压缩代码:基于Word的标记代码(WBTC)和位并行算法用于搜索目的。实验结果表明,我们的算法比未压缩文本上的搜索速度快三倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号