首页> 外文会议>IEEE International Conference on e-Business Engineering >Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data
【24h】

Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data

机译:用于分类高维噪声数据的树木加权森林方法

获取原文

摘要

Random forest is an excellent ensemble learning method, which is composed of multiple decision trees grown on random input samples and splitting nodes on a random subset of features. Due to its good classification and generalization ability, random forest has achieved success in various domains. However, random forest will generate many noisy trees when it learns from the data set that has high dimension with many noise features. These noisy trees will affect the classification accuracy, and even make a wrong decision for new instances. In this paper, we present a new approach to solve this problem through weighting the trees according to their classification ability, which is named Trees Weighting Random Forest (TWRF). Here, Out-Of-Bag, which is the training data subset generated by Bagging and not involved in building decision tree, is used to evaluate the tree. For simplicity, we choose the accuracy as the index that notes tree’s classification ability and set it as the tree’s weight. Experiments show that TWRF has better performance than the original random forest and other traditional methods, such as C45, Naïve Bayes and so on.
机译:随机森林是一种优秀的集合学习方法,它由在随机输入样本上生长的多个决策树和随机特征子集的分离节点组成。由于其良好的分类和泛化能力,随机森林在各个领域取得了成功。然而,当它从具有许多具有许多噪声功能的数据集中学习时,随机森林将生成许多嘈杂的树木。这些嘈杂的树木会影响分类准确性,甚至对新实例做出了错误的决定。在本文中,我们提出了一种通过根据其分类能力加权树木来解决这个问题的新方法,这些方法被称为树加权随机森林(TWRF)。在这里,袋子外,这是由袋装而非涉及构建决策树生成的训练数据子集,用于评估树。为简单起见,我们选择准确性作为指标,即指出树的分类能力并将其设置为树的重量。实验表明,TWRF比原来的随机森林和其他传统方法具有更好的性能,例如C45,Naïve贝叶斯等。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号