首页> 外文期刊>Algorithms >An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms
【24h】

An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

机译:模式匹配算法中字符访问次数分布的计算算法

获取原文
           

摘要

We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm's running time cost (such as the number of text character accesses) for any given pattern in a random text model. Text models range from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide an algorithm to compute the exact distribution of differences in running time cost of two pattern matching algorithms. Methodologically, we use extensions of finite automata which we call deterministic arithmetic automata (DAAs) and probabilistic arithmetic automata (PAAs) [1]. Given an algorithm, a pattern, and a text model, a PAA is constructed from which the sought distributions can be derived using dynamic programming. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly by computing the whole distribution of running time cost. Experimentally, we compare Horspool's algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations.
机译:我们为基于窗口的模式匹配算法(如Boyer-Moore,Horspool,Backward DAWG Matching,Backward Oracle Matching等)的精确概率分析提出了一个框架。特别是,我们开发了一种算法,可以针对随机文本模型中任何给定的模式有效地计算模式匹配算法的运行时间成本(例如文本字符访问次数)的分布。文本模型的范围从简单的统一模型到高阶马尔可夫模型或隐马尔可夫模型(HMM)。此外,我们提供了一种算法来计算两种模式匹配算法的运行时间成本差异的精确分布。从方法上讲,我们使用有限自动机的扩展,我们称其为确定性算术自动机(DAA)和概率算术自动机(PAA)[1]。给定一个算法,一个模式和一个文本模型,就可以构建一个PAA,可以使用动态编程从中得出所需的分布。据我们所知,这是第一次通过计算运行时间成本的整体分布来精确分析基于子字符串或后缀的模式匹配算法。在实验上,我们在较短的原型模式上比较了Horspool的算法,后向DAWG匹配和后向Oracle匹配,并提供了用于这些计算的最小DAA大小的统计信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号