...
首页> 外文期刊>International Journal of Environmental Impacts: Management, Mitigation and Recovery >APPLYING HYBRID FEATURE SELECTION METHODS FOR STATISTICAL MODELLING OF ROADSIDE PARTICLE CONCENTRATIONS (PM_(2.5) AND PNC)
【24h】

APPLYING HYBRID FEATURE SELECTION METHODS FOR STATISTICAL MODELLING OF ROADSIDE PARTICLE CONCENTRATIONS (PM_(2.5) AND PNC)

机译:应用混合特征选择方法统计建模的路边粒子浓度(PM_(2.5)和PNC)

获取原文
获取原文并翻译 | 示例

摘要

The task of selecting a predictor variable to include in statistical models is enormous. A model built with fewer predictor variables can be more interpretable and less expensive than the one built with many input variables. In this study, the effects of hybrid feature selection methods (genetic algorithms [GA] and simulated annealing (SA) each combined with random forests [RF]) in improving the efficiency of five variants of multiple linear regression models in the prediction of roadside PM_(2.5) and particle number count (PNC) concentrations are investigated. The GA-RF and SA-RF selected 9 and 16 variables, respectively, of the 27 predictor variables in the PM_(2.5) training data. Thirteen variables were selected by the GA-RF of the 25 possible variables in the PNC training data, while the SA-RF selected 13 variables. The methods selected variables that are nearly the same especially for predicting PNC, while for the PM_(2.5) models the SA-RF selected 16 variables and the GA-RF selected only 10 variables. The hybrid feature selection methods eliminated most of the correlated variables, especially the background pollutants and the traffic variables. Whereas the temporal variables and the meteorological variable have been selected in all the cases considered. The statistical performance of the linear models with the selected variables is similar to those developed using the entire predictor variables. The actual benefit derived from this study is the successful reduction in the number of predictor variables by more than half in most of the cases considered. The reduction in the number of variables will eventually result in the reduction of the operational and computational cost of the models without possibly compromising the predictive performance of the models. Also, the reduction in the number of variables will enhance interpretability.
机译:选择一个预测变量的任务包括在统计模型是巨大的。模型建立与预测变量可以更少比的解释和便宜一个由许多输入变量。研究中,混合特征选择的影响(遗传算法(GA)和模拟方法退火(SA)结合随机森林(射频))在改善五的效率多元线性回归模型的变体路边的预测PM_(2.5)和粒子数量统计(PNC)浓度调查。27日的16个变量,分别预测变量PM_(2.5)训练数据。变量选择的GA-RF 25可能的变量在PNC训练数据,而SA-RF选择13个变量。方法选择的变量几乎特别是对于预测PNC相同,而为PM_(2.5)模型SA-RF选择16个变量和GA-RF选择只有10变量。混合特征选择方法消除相关的变量,特别是背景污染和交通变量。而时间变量和气象变量被选择的情况下考虑。所选变量的线性模型类似于那些使用整个开发吗预测变量。从本研究成功的减少预测变量超过的数量在大多数的情况下考虑的一半。变量的数量将减少最终导致的减少操作和计算的成本模型而不可能预测模型的性能。变量的数量将增加可解释性。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号