Real-time stream data mining based on CanTree and Gtree

Kim Jaein; Hwang Buhyun

首页> 外文期刊>Information Sciences: An International Journal >Real-time stream data mining based on CanTree and Gtree

【24h】

Real-time stream data mining based on CanTree and Gtree

机译：基于CanTree和Gtree的实时流数据挖掘

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We face an increasing need to discover knowledge from data streams in real-time. Real-time stream data mining needs a compact data structure to store transactions in the recent sliding-window by one scan, and an efficient algorithm to discover frequent itemsets from the compact data structure. In this paper, we propose a novel data mining algorithm, called CanTree-GTree, which discovers the complete frequent itemsets from real-time transactions based on sliding-windows. The algorithm uses two data structures: CanTree and GTree. CanTree compactly represents all transactions in a sliding-window by one scan, and serves as a base-tree. The algorithm efficiently maintains the base-tree by adding new transactions and removing old transactions without any reconstruction phases. A novel data structure, called GTree (Group Tree), serves as a projection-tree for each data item. The algorithm traverses each node of the base-tree only once by using a top-down tree traversal method to build the projection-tree, and discovers frequent itemsets by low processing cost. The proposed algorithm is therefore effective for discovering frequent itemsets in real-time stream data. Our performance evaluation experiments with other algorithms based on CPSTree and CanTree-FPTree show that our algorithm outperforms the other algorithms in the synthetic data set by about 35% and 26% of run-time cost, respectively. Also, we confirm that the proposed algorithm shows excellent results on real-world data sets. (C) 2016 Elsevier Inc. All rights reserved.

机译：我们越来越需要实时从数据流中发现知识。实时流数据挖掘需要一个紧凑的数据结构通过一次扫描将事务存储在最近的滑动窗口中，并且需要一种有效的算法来从紧凑的数据结构中发现频繁的项目集。在本文中，我们提出了一种新颖的数据挖掘算法CanTree-GTree，该算法可基于滑动窗口从实时交易中发现完整的频繁项目集。该算法使用两个数据结构：CanTree和GTree。 CanTree通过一次扫描即可紧凑地表示滑动窗口中的所有事务，并用作基础树。该算法通过添加新交易并删除旧交易而无需任何重建阶段，从而有效地维护了基础树。一种新颖的数据结构，称为GTree（组树），用作每个数据项的投影树。该算法通过使用自上而下的树遍历方法来构建投影树，从而仅遍历基树的每个节点一次，并以较低的处理成本发现频繁的项集。因此，所提出的算法对于发现实时流数据中的频繁项集是有效的。我们对基于CPSTree和CanTree-FPTree的其他算法进行的性能评估实验表明，我们的算法在综合数据集中的性能优于其他算法，分别占运行时间成本的35％和26％。此外，我们确认所提出的算法在现实世界的数据集上显示了出色的结果。（C）2016 Elsevier Inc.保留所有权利。

著录项

来源
《Information Sciences: An International Journal 》 |2016年第null期| 共17页
作者
Kim Jaein; Hwang Buhyun;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论 ; 计算机的应用 ; 信息与知识传播 ; 自动化技术、计算机技术 ;
关键词
Real-time data mining; Frequent itemsets; GTree; CanTree; CPSTree;

机译：实时数据挖掘;频繁项集;GTree;CanTree;CPSTree;

相似文献

外文文献
中文文献
专利

1. Real-time stream data mining based on CanTree and Gtree [J] . Kim Jaein, Hwang Buhyun Information Sciences: An International Journal . 2016 ,第Null期

机译：基于CanTree和Gtree的实时流数据挖掘
2. Real-time data mining of massive data streams from synoptic sky surveys [J] . S.G. Djorgovski, M.J. Graham, C. Donalek, Future generation computer systems . 2016 ,第Juna期

机译：来自天气观测的海量数据流的实时数据挖掘
3. A Very Fast Decision Tree Algorithm for Real-Time Data Mining of Imperfect Data Streams in a Distributed Wireless Sensor Network [J] . HangYang, SimonFong, GuangminSun, International Journal of Distributed Sensor Networks . 2012 ,第3期

机译：分布式无线传感器网络中不完善数据流实时数据挖掘的快速决策树算法
4. Performance analysis of real-time face detection system based on stream data mining frameworks [C] . Nikolay Kazanskiy, Pavel Serafimovich, Vladimir Protsenko International Conference on Information Technology and Nanotechnology . 2017

机译：基于流数据挖掘框架的实时脸部检测系统性能分析
5. Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining. [D] . Hao, Boyu. 2010

机译：从不确定的数据中挖掘频繁项集：约束挖掘和流挖掘的扩展。
6. Real-Time Clinical Decision Support System with Data Stream Mining [O] . Yang Zhang, Simon Fong, Jinan Fiaidhi, 2012

机译：带有数据流挖掘的实时临床决策支持系统
7. Real-time data mining of massive data streams from synoptic sky surveys [O] . Djorgovski S. G., Graham M. J., Donalek C., 2016

机译：来自天气观测的海量数据流的实时数据挖掘
8. Data Stream Mining Based Dynamic Link Anomaly Analysis Using Paired Sliding Time Window Data. [R] . Han, K., Zhang, T., Liao, Q. 2014

机译：基于数据流挖掘的成对滑动时间窗数据动态链接异常分析。

Real-time stream data mining based on CanTree and Gtree

摘要

著录项

相似文献

相关主题

期刊订阅