Random Forest Robustness, Variable Importance, and Tree Aggregation

机译：随机森林健壮性，可变重要性和树木聚集

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Random forest methodology is a nonparametric, machine learning approach capable of strong performance in regression and classification problems involving complex datasets. In addition to making predictions, random forests can be used to assess the relative importance of explanatory variables. In this dissertation, we explore three topics related to random forests: tree aggregation, variable importance, and robustness. In Chapter 2, we show that the method of tree aggregation used in one popular random forest implementation can lead to biased class probability estimates and that it is often beneficial to combine the tree partitioning algorithm used in one implementation with the aggregation scheme used in another. In Chapter 3, we show that imputing missing values proir to assessing variable importance often leads to inaccurate variable importance measures. Using simulation studies, we investigate the impact on variable importance of six random-forest-based imputation techniques and find that some techniques are prone to overestimating the importance of variables whose values have been imputed, while other techniques tend to underestimate the importance of such variables. In Chapter 4, we propose a new robust approach for random forest regression. Adapted from a popular approach used in polynomial regression, our method uses residual analysis to modify the weights associated with training cases in random forest predictions, so that outlying training cases have less impact. We show, using simulation studies, that this approach outperforms existing robust techniques on noisy, contaminated datasets.

机译：随机森林方法学是一种非参数的机器学习方法，能够在涉及复杂数据集的回归和分类问题中表现出色。除了做出预测之外，随机森林还可以用于评估解释变量的相对重要性。在本文中，我们探讨了与随机森林有关的三个主题：树的聚合，变量的重要性和鲁棒性。在第2章中，我们显示了一种流行的随机森林实现中使用的树聚合方法会导致有偏的类概率估计，并且将一种实现中使用的树划分算法与另一种实现中使用的聚合方案相结合通常是有益的。在第3章中，我们显示了估算缺失值以评估变量重要性的做法通常会导致变量重要性度量的不准确。使用模拟研究，我们调查了六种基于随机森林的插补技术对变量重要性的影响，发现某些技术倾向于高估已估算值的变量的重要性，而其他技术则倾向于低估此类变量的重要性。在第四章中，我们提出了一种新的鲁棒的随机森林回归方法。根据多项式回归中常用的方法，我们的方法使用残差分析来修改与随机森林预测中的训练案例相关的权重，因此对外围训练案例的影响较小。我们使用仿真研究表明，该方法在嘈杂，受污染的数据集上优于现有的鲁棒技术。

著录项

作者
Sage, Andrew John.;
展开▼
作者单位

Iowa State University.;

展开▼
授予单位 Iowa State University.;
学科 Statistics.
学位 Ph.D.
年度 2018
页码 120 p.
总页数 120
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Important variable assessment and electricity price forecasting based on regression tree models: classification and regression trees, Bagging and Random Forests [J] . González Camino, Mira-McWilliams José, Juárez Isabel Generation, Transmission & Distribution, IET . 2015,第11期

机译：基于回归树模型的重要变量评估和电价预测：分类和回归树，装袋和随机森林
2. Tree aggregation for random forest class probability estimation [J] . Andrew J. Sage, Ulrike Genschel, Dan Nettleton Statistical Analysis and Data Mining . 2020,第2期

机译：随机森林类概率估计的树聚合
3. Tree aggregation for random forest class probability estimation [J] . Industrial and organizational psychology . 2020,第2期

机译：随机林类概率估计的树聚合
4. Understanding variable importances in forests of randomized trees [C] . Gilles Louppe, Louis Wehenkel, Antonio Sutera, Annual conference on Neural Information Processing Systems . 2013

机译：了解随机树木森林中的变量重要性
5. Exploiting random walks for robust, scalable, structure-free data aggregation and routing in mobile ad-hoc networks (MANETs). [D] . Nakagawa, Masahiro. 2016

机译：利用随机游走在移动自组织网络（MANET）中进行健壮，可扩展，无结构的数据聚合和路由。
6. Using Decision Tree Aggregation with Random Forest Model to Identify Gut Microbes Associated with Colorectal Cancer [O] . Dongmei Ai, Hongfei Pan, Rongbao Han, 2019

机译：使用决策树聚合和随机森林模型识别与大肠癌相关的肠道微生物
7. Figure 2: Average partial dependence plots for the four most influential variables in the 20 randomized runs of boosted classification trees models analysing habitat suitability of the nesting location of successful breeding pairs of blue chaffinches against the same number of pixels of the same size randomly obtained from the pine forests of Inagua reserve. [O] . -1

机译：图2：20个随机分类树中的四个最有影响力的变量的平均部分依赖性地块模型分析了成功育种成对的蓝色蛋卷的栖息地适用于从中随机获得的相同大小的相同数量的像素数的栖息地位置Inagua植物的松树林。

Random Forest Robustness, Variable Importance, and Tree Aggregation

摘要

著录项

相似文献

相关主题

期刊订阅