首页> 外文会议>7th IEEE International Conference on e-Business Engineering >Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data
【24h】

Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data

机译:树木加权随机森林法分类高维噪声数据

获取原文

摘要

Random forest is an excellent ensemble learning method, which is composed of multiple decision trees grown on random input samples and splitting nodes on a random subset of features. Due to its good classification and generalization ability, random forest has achieved success in various domains. However, random forest will generate many noisy trees when it learns from the data set that has high dimension with many noise features. These noisy trees will affect the classification accuracy, and even make a wrong decision for new instances. In this paper, we present a new approach to solve this problem through weighting the trees according to their classification ability, which is named Trees Weighting Random Forest (TWRF). Here, Out-Of-Bag, which is the training data subset generated by Bagging and not involved in building decision tree, is used to evaluate the tree. For simplicity, we choose the accuracy as the index that notes treeȁ9;s classification ability and set it as the treeȁ9;s weight. Experiments show that TWRF has better performance than the original random forest and other traditional methods, such as C45, Naïve Bayes and so on.
机译:随机森林是一种出色的集成学习方法,它由生长在随机输入样本上的多个决策树和在特征的随机子集上划分节点组成。由于其良好的分类和泛化能力,随机森林在各个领域都取得了成功。但是,当随机森林从具有高维度和许多噪声特征的数据集中学习时,会生成许多嘈杂的树木。这些嘈杂的树会影响分类的准确性,甚至会对新实例做出错误的决定。在本文中,我们提出了一种通过根据树木的分类能力对树木进行加权来解决此问题的新方法,称为树木加权随机森林(TWRF)。在这里,Out-Of-Bag是Bagging生成的训练数据子集,不参与构建决策树,它用于评估树。为简单起见,我们选择精度作为记录树9的分类能力的指标,并将其设置为树9的权重。实验表明,TWRF比原始随机森林和其他传统方法(例如C45,朴素贝叶斯等)具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号