【24h】

Matching a Set of Patterns with Wildcards

机译:将一组模式与通配符匹配

获取原文

摘要

Multi-pattern matching with wildcards is to find all the occurrences of a set of patterns with wildcards in a text. This problem arises in various fields, such as computational biology and network security. But the problem is not extensively studied as the single pattern case and there is no efficient algorithm for this problem. In this paper, we present efficient algorithms based on fast Fourier transforms. Let $P={p_1,ldots,p_k}$ be a set of patterns with wildcards where the total length of patterns is $|P|$, and a text $t$ of length $n$ over alphabet $a_1,ldots,a_{sigma}$. We present two algorithms for this problem where patterns are matched simultaneously. The first algorithm finds the matches of a small set of patterns in the text in $O(nlog |P|+nk)$ time. The words used in the algorithm are of size $klceil2lgsigmarceil+sum_{i=1}^k lceillg|p_i|rceil$ bits. The second one finds the matchings of patterns in the text in time $O(nlog |P|logsigma+nk)$ by computing the Hamming distance between the patterns and the text. The algorithm uses the words with $sum_{i=1}^k lceillg|p_i|rceil$ bits. We also demonstrate an FFT implementation based on the modular arithmetic for machines with word size of 64 bits. Finally, we show that both algorithms can be easily parallelized and the parallelized algorithms are given as well.
机译:带通配符的多模式匹配用于查找文本中带有通配符的一组模式的所有出现。这个问题出现在各个领域,例如计算生物学和网络安全。但是,作为单模式情况,该问题尚未得到广泛研究,并且没有针对该问题的有效算法。在本文中,我们提出了基于快速傅立叶变换的高效算法。假设$ P = {p_1,ldots,p_k} $是一组带有通配符的模式,其中模式的总长度为$ | P | $,并且文本$ t $的长度为$ n $,字母为$ a_1,ldots, a_ {sigma} $。我们针对模式匹配的两个问题提出了两种算法。第一种算法在$ O(nlog | P | + nk)$的时间内找到文本中一小部分模式的匹配项。该算法中使用的字的大小为$ klceil2lgsigmarceil + sum_ {i = 1} ^ k lceillg | p_i | rceil $位。第二个通过计算模式与文本之间的汉明距离,在时间$ O(nlog | P | logsigma + nk)$中找到文本中模式的匹配。该算法使用具有$ sum_ {i = 1} ^ k lceillg | p_i | rceil $位的字。我们还演示了基于模块化算法的FFT实现,适用于字长为64位的机器。最后,我们证明了两种算法都可以轻松并行化,并给出了并行化算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号