首页> 外文期刊>Data mining and knowledge discovery >Efficient algorithms for mining and incremental update of maximal frequent sequences
【24h】

Efficient algorithms for mining and incremental update of maximal frequent sequences

机译:最大频繁序列的挖掘和增量更新的高效算法

获取原文
获取原文并翻译 | 示例
           

摘要

We study two problems: (1) mining frequent sequences from a transactional database, and (2) incremental update of frequent sequences when the underlying database changes over time. We review existing sequence mining algorithms including GSP, PrefixSpan, SPADE, and ISM. We point out the large memory requirement of PrefixSpan, SPADE, and ISM, and evaluate the performance of GSP. We discuss the high I/O cost of GSP, particularly when the database contains long frequent sequences. To reduce the I/O requirement, we propose an algorithm MFS, which could be considered as a generalization of GSP. The general strategy of MFS is to first find an approximate solution to the set of frequent sequences and then perform successive refinement until the exact set of frequent sequences is obtained. We show that this successive refinement approach results in a significant improvement in I/O cost. We discuss how MFS can be applied to the incremental update problem. In particular, the result of a previous mining exercise can be used (by MFS) as a good initial approximate solution for the mining of an updated database. This results in an I/O efficient algorithm. To improve processing efficiency, we devise pruning techniques that, when coupled with GSP or MFS, result in algorithms that are both CPU and I/O efficient.
机译:我们研究了两个问题:(1)从事务数据库中挖掘频繁序列,(2)当底层数据库随时间变化时频繁序列的增量更新。我们查看现有的序列挖掘算法,包括GSP,PrefixSpan,Spade和ISM。我们指出了前缀,铁锹和ISM的大内存要求,并评估GSP的性能。我们讨论GSP的高I / O成本,特别是当数据库包含长期频繁序列时。为了减少I / O要求,我们提出了一种算法MFS,可以被视为GSP的概括。 MFS的一般策略首先找到频繁序列集的近似解,然后执行连续改进,直到获得精确的频繁序列。我们表明,这种连续的细化方法会导致I / O成本的显着提高。我们讨论MFS如何应用于增量更新问题。特别地,可以使用先前采矿锻炼的结果(MFS)作为更新数据库挖掘的良好初始近似解。这导致I / O高效算法。为了提高加工效率,我们设计了修剪技术,当与GSP或MFS耦合时,导致算法均为CPU和I / O高效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号