...
首页> 外文期刊>The Computer journal >Fast Pattern-Matching via K-bit Filtering Based Text Decomposition
【24h】

Fast Pattern-Matching via K-bit Filtering Based Text Decomposition

机译:通过基于K位过滤的文本分解进行快速模式匹配

获取原文
获取原文并翻译 | 示例

摘要

This study explores an alternative way of storing text files to answer exact match queries faster. We decompose the original file into two parts as filter and payload. The filter part contains the most informative k bits of each byte, and the remaining bits of the bytes are concatenated in the order of appearance to generate the payload. We refer to this structure as k-bit filtered format. When an input pattern is to be searched on the k-bit filtered structure, the same decomposition is performed on the pattern. The k bits from each byte of the pattern form the pattern filter bit sequence, and the rest is the payload. The pattern filter is first scanned on the filter part of the file. At each match position detected in the filter part, the pattern payload is verified against the corresponding location in the payload part of the text. Thus, instead of searching an m-byte pattern on an n-byte text, first k • m bits are scanned on k•n bits, followed by a verification of (8 - k) • m bits on the respective locations of the matching positions. Experiments conducted on natural language texts, plain ASCII DNA sequences and random byte sequences showed that the search performance with the proposed scheme is on average two times faster than the tested best exact pattern-matching algorithms. The highest gain is obtained on plain ASCII DNA sequences. We also developed an effective bitwise pattern-matching algorithm of possible independent interest within this study.
机译:这项研究探索了存储文本文件以更快地回答完全匹配查询的另一种方法。我们将原始文件分解为两个部分:过滤器和有效负载。过滤器部分包含每个字节中信息最丰富的k位,然后按出现顺序将字节的其余位连接起来以生成有效负载。我们将此结构称为k位过滤格式。当要在k位过滤结构上搜索输入模式时,对该模式执行相同的分解。模式每个字节中的k位形成模式过滤器位序列,其余为有效载荷。首先在文件的过滤器部分上扫描模式过滤器。在过滤器部分中检测到的每个匹配位置处,针对文本中有效载荷部分中的相应位置验证模式有效载荷。因此,不是在n字节的文本上搜索m字节的模式,而是在k•n的位上扫描前k•m位,然后在匹配的各个位置上验证(8-k)•m位职位。在自然语言文本,纯ASCII DNA序列和随机字节序列上进行的实验表明,提出的方案的搜索性能平均比经过测试的最佳精确模式匹配算法快两倍。在纯ASCII DNA序列上可获得最高增益。在这项研究中,我们还开发了一种有效的按位模式匹配算法,可能具有独立的兴趣。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号