【24h】

FAST Sequence Mining Based on Sparse Id-Lists

机译:基于稀疏ID列表的快速序列挖掘

获取原文

摘要

Sequential pattern mining is an important data mining task with applications in basket analysis, world wide web, medicine and telecommunication. This task is challenging because sequence databases are usually large with many and long sequences and the number of possible sequential patterns to mine can be exponential. We proposed a new sequential pattern mining algorithm called FAST which employs a representation of the dataset with indexed sparse id-lists to fast counting the support of sequential patterns. We also use a lexicographic tree to improve the efficiency of candidates generation. FAST mines the complete set of patterns by greatly reducing the effort for support counting and candidate sequences generation. Experimental results on artificial and real data show that our method outperforms existing methods in literature up to an order of magnitude or two for large datasets.
机译:顺序模式挖掘是一个重要的数据挖掘任务,具有篮子分析,万维网,医学和电信的应用。此任务具有挑战性,因为序列数据库通常具有许多和长序列以及挖掘的可能顺序模式的数量可以是指数级的。我们提出了一种新的序列模式挖掘算法,该算法迅速使用具有索引的稀疏ID列表的数据集的表示,以快速计数顺序模式的支持。我们还使用词典树来提高候选人的效率。通过大大减少支持计数和候选序列生成的努力,快速挖掘完整的模式。人工和真实数据的实验结果表明,我们的方法优于大量的文学中的现有方法,或者对于大型数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号