A New Decision Tree Classification Method for Mining High-Speed Data Streams Based on Threaded Binary Search Trees

机译：基于线程二叉搜索树的高速数据流决策树分类新方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

One of most important algorithms for mining data streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. In this paper, we revisit this problem and implemented a system VFDTt on top of VFDT and VFDTc. We make the following three contributions: 1) we present a threaded binary search trees (TBST) approach for efficiently handling continuous attributes. It builds a threaded binary search tree, and its processing time for values inserting is O(nlogn), while VFDTs processing time is O(n~2). When a new example arrives, VFDTc need update O(logn) attribute tree nodes, but VFDTt just need update one necessary node.2) we improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it improves from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, VFDTf s candidate split-test number decrease from O(n) to O(logn,).Comparing to VFDT, the most relevant property of our system is an average reduction of 25.53% in processing time, while keep the same tree size and accuracy. Overall, the techniques introduced here significantly improve the efficiency of decision tree classification on data streams.

机译：VFDT是用于挖掘数据流的最重要算法之一。它使用Hoeffding不等式来实现所构造树的准确性的概率边界。 Gama等。已经在两个方向上扩展了VFDT。他们的系统VFDTc可以处理连续数据，并在树叶上使用更强大的分类技术。在本文中，我们重新审视了这个问题，并在VFDT和VFDTc之上实现了系统VFDTt。我们做出以下三个贡献：1）我们提出了一种有效地处理连续属性的线程二叉搜索树（TBST）方法。它构建了一个线程二叉搜索树，其插入值的处理时间为O（nlogn），而VFDT的处理时间为O（n〜2）。当一个新的例子到来时，VFDTc需要更新O（logn）属性树节点，而VFDTt仅需要更新一个必要的节点。2）我们改进了获取给定连续属性的最佳分裂测试点的方法。与VFDTc中使用的方法相比，它的处理时间从O（nlogn）改善为O（n）。 3）与VFDTc相比，VFDTf的候选拆分测试次数从O（n）减少到O（logn，）。与VFDT相比，我们系统最相关的属性是处理时间平均减少25.53％，同时保持相同的树大小和准确性。总体而言，此处介绍的技术显着提高了数据流上决策树分类的效率。

著录项

来源
《PAKDD(Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining) 2007 International Workshops; 20070522; Nanjing(CN)》|2007年|P.256267|共2页
会议地点 Nanjing(CN)
作者
Tao Wang; Zhoujun Li; Xiaohua Hu; Yuejin Yan; Huowang Chen;
展开▼
作者单位

Computer School, National University of Defense Technology, Changsha, 410073, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词
data streams; VFDT; continuous attribute; threaded binary search tree;

机译：数据流； VFDT；连续属性；线程二叉搜索树;

相似文献

外文文献
中文文献
专利

1. Constructing Decision Trees for Mining High-speed Data Streams [J] . XU Wenhua, QIN Zheng 电子学报：英文版 . 2012,第002期

机译：构建用于挖掘高速数据流的决策树
2. Clustering feature decision trees for semi-supervised classification from high-speed data streams [J] . Wen-hua?Xu, Zheng?Qin, Yang?Chang Journal of Zhejiang university science . 2011,第8期

机译：聚类特征决策树，用于从高速数据流进行半监督分类
3. Clustering feature decision trees for semi-supervised classification from high-speed data streams [J] . Wen-hua XU, Zheng QIN, Yang CHANG 浙江大学学报（英文版）（C辑：计算机与电子） . 2011,第008期

机译：聚类特征决策树，用于从高速数据流进行半监督分类
4. A New Decision Tree Classification Method for Mining High-Speed Data Streams Based on Threaded Binary Search Trees [C] . Tao Wang, Zhoujun Li, Xiaohua Hu, PAKDD(Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining) 2007 International Workshops . 2007

机译：一种新的决策树分类方法，用于基于线程二进制搜索树挖掘高速数据流
5. Data mining in databases: An extended decision tree approach and methodology in database environment. [D] . Iliskovic, Sinisa A. 2000

机译：数据库中的数据挖掘：数据库环境中的扩展决策树方法和方法。
6. Two-point-based binary search trees for accelerating big data classification using KNN [O] . Ahmad B. A. Hassanat -1

机译：基于两点的二进制搜索树用于使用KNN加速大数据分类
7. On Converting the Furthest-Pair-Based Binary Search Tree to a Decision Tree: Experimental Results on Big Data Classification [O] . Ahmad B. A. Hassanat 2018

机译：将基于对的二进制搜索树转换为决策树：大数据分类的实验结果
8. Genetic Program Based Data Mining of Fuzzy Decision Trees and Methods of Improving Convergence and Reducing Bloat [R] . Smith, I. J., Nguyen, T. H. 2007

机译：基于遗传程序的模糊决策树数据挖掘及提高收敛和减少膨胀的方法

A New Decision Tree Classification Method for Mining High-Speed Data Streams Based on Threaded Binary Search Trees

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅