首页> 外文期刊>International Journal of Electrical Engineering: Transactions of the Chinese Institute of Engineers, Series E >EFFICIENT SUBSET-LATTICE ALGORITHMS FOR MINING CLOSED FREQUENT ITEMSETS AND MAXIMAL FREQUENT ITEMSETS IN DATA STREAMS
【24h】

EFFICIENT SUBSET-LATTICE ALGORITHMS FOR MINING CLOSED FREQUENT ITEMSETS AND MAXIMAL FREQUENT ITEMSETS IN DATA STREAMS

机译:高效的子格算法,用于挖掘数据流中的封闭频率项和最大频率项

获取原文
获取原文并翻译 | 示例
       

摘要

There are multiple applications for using association rules in data streams, such as market analysis, sensor networks and web tracking. Because data streams are continuous, high speed, unbounded and in real time, we can only scan once for data streams. Mining closed frequent itemsets is a further work of mining association rules, where a closed frequent itemset is a frequent itemset which has no superset with the same support. One well-known algorithm for mining closed frequent itemsets based on the sliding window model is the NewMoment algorithm. However, the NewMoment algorithm could not efficiently mine closed frequent itemsets in data streams, since they will generate closed frequent itemsets and many unclosed frequent itemsets. Moreover, when data in the sliding window is incrementally updated, the NewMoment algorithm needs to reconstruct the whole tree structure. On the other hand, a frequent itemset is called maximal, if it is not a subset of any other frequent itemset. One well-known algorithm for mining maximal frequent itemsets based on the sliding window model is called the MFIoSSW algorithm. The MFIoSSW algorithm uses a compact structure to mine the maximal frequent itemsets. However, when the new transaction comes, the number of comparisons between the new transaction and the old transactions is too much. Therefore, in this paper, we propose the Subset-Lattice algorithm, which embed the property of subsets into the lattice structure to efficiently mine closed frequent itemsets and maximal frequent itemsets over a data stream sliding window. Moreover, when data in the sliding window is incrementally updated, our Subset-Lattice algorithms will not reconstruct the whole lattice structure. From our simulation results, we show that our algorithm for mining closed frequent itemsets outperforms the NewMoment algorithm, and our algorithm for mining maximal frequent itemsets also outperforms the MFIoSSW algorithm.
机译:有多种应用程序在数据流中使用关联规则,例如市场分析,传感器网络和Web跟踪。因为数据流是连续的,高速的,无限制的并且是实时的,所以我们只能扫描一次数据流。挖掘封闭的频繁项集是挖掘关联规则的进一步工作,其中封闭的频繁项集是在相同支持下没有超集的频繁项集。一种基于滑动窗口模型的频繁闭合项目集挖掘算法是NewMoment算法。但是,NewMoment算法无法有效地挖掘数据流中的封闭频繁项目集,因为它们将生成封闭频繁项目集和许多未封闭频繁项目集。此外,当滑动窗口中的数据被增量更新时,NewMoment算法需要重建整个树结构。另一方面,如果频繁项集不是任何其他频繁项集的子集,则称为最大项集。一种基于滑动窗口模型的挖掘最大频繁项集的著名算法称为MFIoSSW算法。 MFIoSSW算法使用紧凑的结构来挖掘最大频繁项集。但是,当新事务到来时,新事务和旧事务之间的比较次数太多了。因此,在本文中,我们提出了Subset-Lattice算法,该算法将子集的属性嵌入晶格结构中,以有效地挖掘数据流滑动窗口上的封闭频繁项集和最大频繁项集。此外,当滑动窗口中的数据被增量更新时,我们的子集格算法将不会重建整个晶格结构。从仿真结果可以看出,我们的封闭频繁项目集挖掘算法优于NewMoment算法,而最大频繁项目集挖掘算法也优于MFIoSSW算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号