首页> 中文期刊>计算机科学 >在线挖掘数据流闭频繁项集的高效算法

在线挖掘数据流闭频繁项集的高效算法

     

摘要

Mining closed frequent itemsets from data streams has been extensively studied,in which NewMoment is regarded as a typical algorithm. However, there is the problem of its big search space causing bad performance in time in NewMoment. This paper presented an algorithm called A-NewMoment which ameliorates NewMoment to mine closed frequent itemsets. Firstly.it designed a combinative data structure which uses an effective bit-victor to represent items and an extended frequent item list to record the current closed frequent information in streams. Secondly, the new pruning strategies called WSS and CSS were proposed to avoid a large number of intermediate results generated, so the search space is reduced greatly. Finally,the pruning strategy called DNFIPS was also proposed to delete no closed frequent itemsets from HTC. At the same time,it also designed a novel strategy called DNSS to efficiently and dynamically maintain these operations that all closed frequent itemsedts are added and deleted. Theoretical analysis and experimental results show that the proposed method is efficient.%数据流闭频繁项集挖掘算法得到了广泛的研究,其中一个典型的工作就是NewMoment算法.针对NewMoment算法存在搜索空间大而造成算法时间效率低的问题,提出了一种改进的数据流闭频繁项集挖掘算法A-New-Moment.它设计了一个二进制位表示项目与扩展的频繁项目列表相结合的数据结构,来记录数据流信息及闭频繁项集.在窗体初始阶段,首先挖掘频繁1-项集所产生的支持度为最大的最长闭频繁项集,接着提出新的“不需扩展策略”和“向下扩展策略”来避免生成大量中间结果,快速发现其余闭频繁项集,达到极大缩小搜索空间的目的.在窗体滑动阶段,提出“动态不频繁剪枝策略”来从已生成的闭频繁项集中快速删除非闭频繁项集,并提出“动态不搜索策略”来动态维护所有闭频繁项集的生成,以降低闭频繁项集的维护代价,提高算法的效率.理论分析与实验结果表明,A-New-Moment算法具有较好的性能.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号