...
首页> 外文期刊>Information Sciences: An International Journal >Real-time stream data mining based on CanTree and Gtree
【24h】

Real-time stream data mining based on CanTree and Gtree

机译:基于CanTree和Gtree的实时流数据挖掘

获取原文
获取原文并翻译 | 示例

摘要

We face an increasing need to discover knowledge from data streams in real-time. Real-time stream data mining needs a compact data structure to store transactions in the recent sliding-window by one scan, and an efficient algorithm to discover frequent itemsets from the compact data structure. In this paper, we propose a novel data mining algorithm, called CanTree-GTree, which discovers the complete frequent itemsets from real-time transactions based on sliding-windows. The algorithm uses two data structures: CanTree and GTree. CanTree compactly represents all transactions in a sliding-window by one scan, and serves as a base-tree. The algorithm efficiently maintains the base-tree by adding new transactions and removing old transactions without any reconstruction phases. A novel data structure, called GTree (Group Tree), serves as a projection-tree for each data item. The algorithm traverses each node of the base-tree only once by using a top-down tree traversal method to build the projection-tree, and discovers frequent itemsets by low processing cost. The proposed algorithm is therefore effective for discovering frequent itemsets in real-time stream data. Our performance evaluation experiments with other algorithms based on CPSTree and CanTree-FPTree show that our algorithm outperforms the other algorithms in the synthetic data set by about 35% and 26% of run-time cost, respectively. Also, we confirm that the proposed algorithm shows excellent results on real-world data sets. (C) 2016 Elsevier Inc. All rights reserved.
机译:我们越来越需要实时从数据流中发现知识。实时流数据挖掘需要一个紧凑的数据结构通过一次扫描将事务存储在最近的滑动窗口中,并且需要一种有效的算法来从紧凑的数据结构中发现频繁的项目集。在本文中,我们提出了一种新颖的数据挖掘算法CanTree-GTree,该算法可基于滑动窗口从实时交易中发现完整的频繁项目集。该算法使用两个数据结构:CanTree和GTree。 CanTree通过一次扫描即可紧凑地表示滑动窗口中的所有事务,并用作基础树。该算法通过添加新交易并删除旧交易而无需任何重建阶段,从而有效地维护了基础树。一种新颖的数据结构,称为GTree(组树),用作每个数据项的投影树。该算法通过使用自上而下的树遍历方法来构建投影树,从而仅遍历基树的每个节点一次,并以较低的处理成本发现频繁的项集。因此,所提出的算法对于发现实时流数据中的频繁项集是有效的。我们对基于CPSTree和CanTree-FPTree的其他算法进行的性能评估实验表明,我们的算法在综合数据集中的性能优于其他算法,分别占运行时间成本的35%和26%。此外,我们确认所提出的算法在现实世界的数据集上显示了出色的结果。 (C)2016 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号