Variable importance-weighted Random Forests

Yiyi Liu; Hongyu Zhao

首页> 外文期刊>Quantitative biology >Variable importance-weighted Random Forests

【24h】

Variable importance-weighted Random Forests

机译：可变重要性加权随机森林

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number of features increases. To address this limitation, feature elimination Random Forests was proposed that only uses features with the largest variable importance scores. Yet the performance of this method is not satisfying, possibly due to its rigid feature selection, and increased correlations between trees of forest. Methods: We propose variable importance-weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees, samples features according to their variable importance scores, and then select the best split from the randomly selected features. Results: We evaluate the performance of our method through comprehensive simulation and real data analyses, for both regression and classification. Compared to the standard Random Forests and the feature elimination Random Forests methods, our proposed method has improved performance in most cases. Conclusions: By incorporating the variable importance scores into the random feature selection step, our method can better utilize more informative features without completely ignoring less informative ones, hence has improved prediction accuracy in the presence of weak signals and large noises. We have implemented an R package "viRandomForests" based on the original R package "randomForest" and it can be freely downloaded from http:// zhaocenter.org/software.

机译：背景：随机森林是一种流行的分类和回归方法，已被证明对生物学研究中的各种预测问题均有效。但是，当功能数量增加时，其性能通常会下降。为了解决此限制，提出了特征消除随机森林，该算法仅使用具有最大可变重要性得分的特征。然而，该方法的性能不能令人满意，这可能是由于其刚性特征选择以及森林树木之间的相关性增加所致。方法：我们提出了可变重要性加权随机森林，该算法不是在每个节点上以相等的概率对特征进行采样来构建树，而是根据其可变重要性得分对特征进行采样，然后从随机选择的特征中选择最佳分割。结果：我们通过综合仿真和真实数据分析（包括回归和分类）评估了我们方法的性能。与标准随机森林法和特征消除随机森林法相比，我们提出的方法在大多数情况下具有更高的性能。结论：通过将可变的重要性得分纳入随机特征选择步骤，我们的方法可以更好地利用更多的信息特征，而不会完全忽略信息量较小的特征，因此在存在弱信号和大噪声的情况下提高了预测精度。我们已经基于原始R包“ randomForest”实现了R包“ viRandomForests”，可以从http：// zhaocenter.org/software免费下载。

著录项

来源
《Quantitative biology》 |2017年第4期|338-351|共14页
作者
Yiyi Liu; Hongyu Zhao;
展开▼
作者单位

Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06511, USA;

Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06511, USA,Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Random Forests; variable importance score; classification; regression;

机译：随机森林可变重要性评分;分类;回归;
入库时间 2022-08-17 23:18:21

相似文献

外文文献
中文文献
专利

1. Variable importance-weighted Random Forests [J] . Yiyi Liu, Hongyu Zhao 定量生物学（英文版） . 2017,第004期

机译：可变重要性加权随机森林
2. Variable importance-weighted Random Forests [J] . Yiyi Liu1, Hongyu Zhao12 中国电气与电子工程前沿：英文版 . 2017,第004期

机译：可变重要性加权随机森林
3. A Systematic Approach for Variable Selection With Random Forests: Achieving Stable Variable Importance Values [J] . Amir Behnamian, Koreen Millard, Sarah N. Banks, IEEE Geoscience and Remote Sensing Letters . 2017,第11期

机译：随机森林变量选择的系统方法：实现稳定的变量重要性值
4. Understanding variable importances in forests of randomized trees [C] . Gilles Louppe, Louis Wehenkel, Antonio Sutera, Annual conference on Neural Information Processing Systems . 2013

机译：了解随机树木森林中的变量重要性
5. Random Forest Robustness, Variable Importance, and Tree Aggregation [D] . Sage, Andrew John. 2018

机译：随机森林健壮性，可变重要性和树木聚集
6. Variable importance-weighted Random Forests [O] . Yiyi Liu, Hongyu Zhao -1

机译：可变重要性加权随机森林
7. Figure 2: Average partial dependence plots for the four most influential variables in the 20 randomized runs of boosted classification trees models analysing habitat suitability of the nesting location of successful breeding pairs of blue chaffinches against the same number of pixels of the same size randomly obtained from the pine forests of Inagua reserve. [O] . -1

机译：图2：20个随机分类树中的四个最有影响力的变量的平均部分依赖性地块模型分析了成功育种成对的蓝色蛋卷的栖息地适用于从中随机获得的相同大小的相同数量的像素数的栖息地位置Inagua植物的松树林。

Variable importance-weighted Random Forests

摘要

著录项

相似文献

相关主题

期刊订阅