首页> 外文会议>SIAM International Conference on Data Mining >CloSpan: Mining Closed Sequential Patterns in Large Datasets
【24h】

CloSpan: Mining Closed Sequential Patterns in Large Datasets

机译:Clowpan:在大型数据集中挖掘闭合序列模式

获取原文

摘要

Previous sequential pattern mining algorithms mine the full set of frequent subsequences satisfying a min_sup threshold in a sequence database. However, since a frequent long sequence contains a combinatorial number of frequent subsequences, such mining will generate an explosive number of frequent subsequences for long patterns, which is prohibitively expeasive in both time and space. In this paper, we propose an alternative but equally powerful solution: instead of mining the complete set of frequent subsequences, we mine frequent closed subsequences only, i.e., those containing no super-sequence with the same support (i.e., occurrence frequency). By exploring novel global optimization techniques, an efficient algorithm, called CloSpan (Closed Sequential pattern mining) is developed, which outperforms the previous work by one order of magnitude. Moreover, CloSpan can mine really long sequences, which, to the best of our knowledge, is un-minable by previous algorithms. Finally, CloSpan produces a significantly less number of discovered sequences than the traditional (i.e., full-set) methods while preserving the same expressive power since the whole set of frequent subsequences, together with their supports, can be derived easily from our mining results.
机译:先前的顺序模式挖掘算法挖掘了序列数据库中满足MIN_SUP阈值的全套频繁的子程。然而,由于频繁的长序列包含组合频繁的子句的组合数,因此这种采矿将生成长模式的频繁子次频繁子句,这在时间和空间中都是欠膨胀的。在本文中,我们提出了一种替代但同样强大的解决方案:而不是挖掘完整的频繁子句,而不是仅挖掘频繁关闭的子序列,即,不包含同一支持的超级序列(即发生频率)。通过探索新颖的全局优化技术,开发了一种称为Claypan(闭合序列模式挖掘)的有效算法,其优于前一个数量级。此外,Clowpan可以挖掘真正的长序列,这是我们最知识的,这是以前的算法无法弥补的。最后,切尔潘产生比传统(即全套)的方法产生明显少的发现序列,同时保留了相同的富有表现力的方法,因为整个频繁的续集,以及它们的支持可以很容易地从我们的挖掘结果中推导出来。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号