首页> 外文期刊>Quantitative biology >Variable importance-weighted Random Forests
【24h】

Variable importance-weighted Random Forests

机译:可变重要性加权随机森林

获取原文
获取原文并翻译 | 示例
       

摘要

Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number of features increases. To address this limitation, feature elimination Random Forests was proposed that only uses features with the largest variable importance scores. Yet the performance of this method is not satisfying, possibly due to its rigid feature selection, and increased correlations between trees of forest. Methods: We propose variable importance-weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees, samples features according to their variable importance scores, and then select the best split from the randomly selected features. Results: We evaluate the performance of our method through comprehensive simulation and real data analyses, for both regression and classification. Compared to the standard Random Forests and the feature elimination Random Forests methods, our proposed method has improved performance in most cases. Conclusions: By incorporating the variable importance scores into the random feature selection step, our method can better utilize more informative features without completely ignoring less informative ones, hence has improved prediction accuracy in the presence of weak signals and large noises. We have implemented an R package "viRandomForests" based on the original R package "randomForest" and it can be freely downloaded from http:// zhaocenter.org/software.
机译:背景:随机森林是一种流行的分类和回归方法,已被证明对生物学研究中的各种预测问题均有效。但是,当功能数量增加时,其性能通常会下降。为了解决此限制,提出了特征消除随机森林,该算法仅使用具有最大可变重要性得分的特征。然而,该方法的性能不能令人满意,这可能是由于其刚性特征选择以及森林树木之间的相关性增加所致。方法:我们提出了可变重要性加权随机森林,该算法不是在每个节点上以相等的概率对特征进行采样来构建树,而是根据其可变重要性得分对特征进行采样,然后从随机选择的特征中选择最佳分割。结果:我们通过综合仿真和真实数据分析(包括回归和分类)评估了我们方法的性能。与标准随机森林法和特征消除随机森林法相比,我们提出的方法在大多数情况下具有更高的性能。结论:通过将可变的重要性得分纳入随机特征选择步骤,我们的方法可以更好地利用更多的信息特征,而不会完全忽略信息量较小的特征,因此在存在弱信号和大噪声的情况下提高了预测精度。我们已经基于原始R包“ randomForest”实现了R包“ viRandomForests”,可以从http:// zhaocenter.org/software免费下载。

著录项

  • 来源
    《Quantitative biology》 |2017年第4期|338-351|共14页
  • 作者

    Yiyi Liu; Hongyu Zhao;

  • 作者单位

    Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06511, USA;

    Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06511, USA,Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Random Forests; variable importance score; classification; regression;

    机译:随机森林可变重要性评分;分类;回归;
  • 入库时间 2022-08-17 23:18:21

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号