...
【24h】

Mining sequential patterns with periodic wildcard gaps

机译:挖掘具有周期性通配符间隔的顺序模式

获取原文
获取原文并翻译 | 示例

摘要

Mining frequent patterns with periodic wildcard gaps is a critical data mining problem to deal with complex real-world problems. This problem can be described as follows: given a subject sequence, a pre-specified threshold, and a variable gap-length with wildcards between each two consecutive letters. The task is to gain all frequent patterns with periodic wildcard gaps. State-of-the-art mining algorithms which use matrices or other linear data structures to solve the problem not only consume a large amount of memory but also run slowly. In this study, we use an Incomplete Nettree structure (the last layer of a Nettree which is an extension of a tree) of a sub-pattern P to efficiently create Incomplete Nettrees of all its super-patterns with prefix pattern P and compute the numbers of their supports in a one-way scan. We propose two new algorithms, MAPB (Mining sequentiAl Pattern using incomplete Nettree with Breadth first search) and MAPD (Mining sequentiAl Pattern using incomplete Nettree with Depth first search), to solve the problem effectively with low memory requirements. Furthermore, we design a heuristic algorithm MAPBOK (MAPB for tOp-K) based on MAPB to deal with the Top-K frequent patterns for each length. Experimental results on real-world biological data demonstrate the superiority of the proposed algorithms in running time and space consumption and also show that the pattern matching approach can be employed to mine special frequent patterns effectively.
机译:具有周期性通配符间隔的频繁模式挖掘是处理复杂的实际问题的关键数据挖掘问题。这个问题可以描述如下:给定一个主题序列,一个预先指定的阈值以及一个可变的间隔长度,每两个连续字母之间带有通配符。任务是获取具有周期性通配符间隔的所有频繁模式。使用矩阵或其他线性数据结构来解决问题的最新挖掘算法不仅消耗大量内存,而且运行缓慢。在本研究中,我们使用子模式P的不完整Nettree结构(Nettree的最后一层是树的扩展)来有效地创建所有带有前缀模式P的所有超级模式的Incomplete Nettree,并计算数字他们的支持单向扫描。我们提出了两种新算法,即MAPB(使用不完整Nettree进行广度优先搜索的挖掘序列模式)和MAPD(使用不完整Nettree进行深度优先搜索的挖掘序列),以有效地解决内存需求低的问题。此外,我们设计了一种基于MAPB的启发式算法MAPBOK(用于tOp-K的MAPB)来处理每种长度的Top-K频繁模式。真实世界生物数据的实验结果证明了该算法在运行时间和空间消耗上的优越性,并且表明模式匹配方法可以有效地挖掘特殊的频繁模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号