...
【24h】

Strict approximate pattern matching with general gaps

机译:严格的近似模式匹配与一般差距

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Pattern matching with gap constraints is one of the essential problems in computer science such as music information retrieval and sequential pattern mining. One of the cases is called loose matching, which only considers the matching position of the last pattern substring in the sequence. One more challenging problem is considering the matching positions of each character in the sequence, called strict pattern matching which is one of the essential tasks of sequential pattern mining with gap constraints. Some strict pattern matching algorithms were designed to handle pattern mining tasks, since strict pattern matching can be used to compute the frequency of some patterns occurring in the given sequence and then the frequent patterns can be derived. In this article, we address a more general strict approximate pattern matching with Hamming distance, named SAP (Strict Approximate Pattern matching with general gaps and length constraints), which means that the gap constraints can be negative. We show that a SAP instance can be transformed into an exponential amount of the exact pattern matching with general gaps instances. Hence, we propose an effective online algorithm, named SETA (SubnETtree for sAp), based on the subnettree structure (a Nettree is an extension of a tree with multi-parents and multi-roots) and show the completeness of the algorithm. The space and time complexities of the algorithm are O(m x Maxlen x W x d) and O(Maxlen x W x m (2) x n x d), respectively, where m, Maxlen, W, and d are the length of pattern P, the maximal length constraint, the maximal gap length of pattern P and the approximate threshold. Extensive experimental results validate the correctness and effectiveness of SETA.
机译:具有间隙约束的模式匹配是计算机科学中的重要问题之一,例如音乐信息检索和顺序模式挖掘。一种情况称为松散匹配,它仅考虑序列中最后一个模式子字符串的匹配位置。一个更具挑战性的问题是考虑序列中每个字符的匹配位置,称为严格模式匹配,这是具有间隙约束的顺序模式挖掘的基本任务之一。设计了一些严格的模式匹配算法来处理模式挖掘任务,因为可以使用严格的模式匹配来计算在给定序列中出现的某些模式的频率,然后才能得出频繁的模式。在本文中,我们解决了一个更普遍的,具有汉明距离的严格近似模式匹配,称为SAP(具有一般间隙和长度约束的严格近似模式匹配),这意味着间隙约束可以为负。我们展示了可以将SAP实例转换为与常规缺口实例匹配的指数量的精确模式。因此,我们基于子网树结构(Nettree是具有多个父级和多个根的树的扩展),提出了一种有效的在线算法,名为SETA(snap的SubnETtree),并显示了该算法的完整性。该算法的空间和时间复杂度分别为O(mx Maxlen x W xd)和O(Maxlen x W xm(2)xnxd),其中m,Maxlen,W和d是模式P的长度,最大值长度限制,图案P的最大间隙长度和近似阈值。大量的实验结果验证了SETA的正确性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号