...
首页> 外文期刊>Knowledge and information systems >CloFAST: closed sequential pattern mining using sparse and vertical id-lists
【24h】

CloFAST: closed sequential pattern mining using sparse and vertical id-lists

机译:CloFAST:使用稀疏和垂直id列表进行封闭式顺序模式挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

Sequential pattern mining is a computationally challenging task since algorithms have to generate and/or test a combinatorially explosive number of intermediate subsequences. In order to reduce complexity, some researchers focus on the task of mining closed sequential patterns. This not only results in increased efficiency, but also provides a way to compact results, while preserving the same expressive power of patterns extracted by means of traditional (non-closed) sequential pattern mining algorithms. In this paper, we present CloFAST, a novel algorithm for mining closed frequent sequences of itemsets. It combines a new data representation of the dataset, based on sparse id-lists and vertical id-lists, whose theoretical properties are studied in order to fast count the support of sequential patterns, with a novel one-step technique both to check sequence closure and to prune the search space. Contrary to almost all the existing algorithms, which iteratively alternate itemset extension and sequence extension, CloFAST proceeds in two steps. Initially, all closed frequent itemsets are mined in order to obtain an initial set of sequences of size 1. Then, new sequences are generated by directly working on the sequences, without mining additional frequent itemsets. A thorough performance study with both real-world and artificially generated datasets empirically proves that CloFAST outperforms the state-of-the-art algorithms, both in time and memory consumption, especially when mining long closed sequences.
机译:顺序模式挖掘是一项计算难题,因为算法必须生成和/或测试组合爆炸性数量的中间子序列。为了降低复杂性,一些研究人员专注于挖掘封闭顺序模式的任务。这不仅提高了效率,而且还提供了一种压缩结果的方法,同时保留了借助传统(非封闭式)顺序模式挖掘算法提取的模式的相同表达能力。在本文中,我们提出CloFAST,这是一种用于挖掘封闭的频繁项集序列的新算法。它结合了基于稀疏id列表和垂直id列表的数据集的新数据表示形式,并研究了其理论特性以便快速计算顺序模式的支持,同时还采用了一种新颖的单步技术来检查序列闭合并修剪搜索空间。与几乎所有现有的算法(迭代地替换项集扩展和序列扩展)相反,CloFAST分两个步骤进行。最初,挖掘所有封闭的频繁项目集以获得大小为1的初始序列集。然后,通过直接处理序列来生成新序列,而无需挖掘其他频繁项目集。通过对真实数据集和人工生成的数据集进行的全面性能研究,经验证明,CloFAST在时间和内存消耗方面均优于最新算法,尤其是在挖掘长封闭序列时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号