首页> 外文会议>Annual International Workshop on Frontiers in Algorithmics >A New Fuzzy Decision Tree Classification Method for Mining High-Speed Data Streams Based on Binary Search Trees
【24h】

A New Fuzzy Decision Tree Classification Method for Mining High-Speed Data Streams Based on Binary Search Trees

机译:一种基于二元搜索树的高速数据流的新模糊决策树分类方法

获取原文

摘要

Decision tree construction is a well-studied problem in data mining. Recently, there has been much interest in mining data streams. Domingos and Hulten have presented a one-pass algorithm for decision tree constructions. Their system using Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. Peng et al. present soft discretization method to solve continuous attributes in data mining. In this paper, we revisit these problems and implemented a system sVFDT for data stream mining. We make the following contributions: 1) we present a binary search trees (BST) approach for efficiently handling continuous attributes. Its processing time for values inserting is O(nlogn), while VFDT‘s processing time is O(n 2 ). 2) We improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it decreases fromO(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, sVFDT‘zs candidate split-test number decrease fromO(n) to O(logn).4)Improve the soft discretization method to increase classification accuracy in data stream mining.
机译:决策树施工是数据挖掘的良好研究。最近,对采矿数据流有很多兴趣。 DomingoS和Hulten呈现了一个用于决策树结构的单通算法。他们的系统使用Hoeffding不平等来实现概率的概率,以构造的树的准确性。 Gama等人。在两个方向上延长了VFDT。他们的系统VFDTC可以处理连续数据并在树叶中使用更强大的分类技术。彭等人。目前软离散化方法解决数据挖掘中的连续属性。在本文中,我们重新审视这些问题并实现了一个系统SVFDT的数据流挖掘。我们提出以下贡献:1)我们提出了一种用于有效处理连续属性的二进制搜索树(BST)方法。其用于插入值的处理时间是O(nlogn),而VFDT的处理时间是O(n 2)。 2)我们改进了给定连续属性的最佳分裂测试点的方法。与VFDTC中使用的方法进行比较,它在处理时间中减少到O(n)到O(n)。 3)与VFDTC相比,SVFDT'S候选拆分测试号从OFO(n)到O(logn).4)提高了数据流挖掘中提高分类精度的软离散化方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号