首页> 外文会议>Foundations of intelligent systems >FAST Sequence Mining Based on Sparse Id-Lists
【24h】

FAST Sequence Mining Based on Sparse Id-Lists

机译:基于稀疏Id列表的FAST序列挖掘

获取原文
获取原文并翻译 | 示例

摘要

Sequential pattern mining is an important data mining task with applications in basket analysis, world wide web, medicine and telecommunication. This task is challenging because sequence databases are usually large with many and long sequences and the number of possible sequential patterns to mine can be exponential. We proposed a new sequential pattern mining algorithm called FAST which employs a representation of the dataset with indexed sparse id-lists to fast counting the support of sequential patterns. We also use a lexicographic tree to improve the efficiency of candidates generation. FAST mines the complete set of patterns by greatly reducing the effort for support counting and candidate sequences generation. Experimental results on artificial and real data show that our method outperforms existing methods in literature up to an order of magnitude or two for large datasets.
机译:顺序模式挖掘是一项重要的数据挖掘任务,在篮子分析,万维网,医学和电信领域都有应用。这项任务具有挑战性,因为序列数据库通常很大,包含许多且很长的序列,并且可能挖掘的顺序模式的数量可能是指数级的。我们提出了一种称为FAST的新的顺序模式挖掘算法,该算法采用具有索引的稀疏ID列表的数据集表示形式来快速计数顺序模式的支持。我们还使用字典树来提高候选人生成的效率。 FAST通过大大减少支持计数和候选序列生成的工作量来挖掘完整的模式集。人工和真实数据的实验结果表明,对于大型数据集,我们的方法比文献中的方法要好一两个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号