首页> 外文期刊>Atmospheric environment >Advancing methodologies for applying machine learning and evaluating spatiotemporal models of fine particulate matter (PM2.5) using satellite data over large regions
【24h】

Advancing methodologies for applying machine learning and evaluating spatiotemporal models of fine particulate matter (PM2.5) using satellite data over large regions

机译:在大型区域上使用卫星数据促进机器学习和评估空间颗粒物质(PM2.5)的时空模型的推进方法

获取原文
获取原文并翻译 | 示例
           

摘要

Reconstructing the distribution of fine particulate matter (PM2.5) in space and time, even far from ground monitoring sites, is an important exposure science contribution to epidemiologic analyses of PM2.5 health impacts. Flexible statistical methods for prediction have demonstrated the integration of satellite observations with other predictors, yet these algorithms are susceptible to overfitting the spatiotemporal structure of the training datasets. We present a new approach for predicting PM2.5 using machine-learning methods and evaluating prediction models for the goal of making predictions where they were not previously available. We apply extreme gradient boosting (XGBoost) modeling to predict daily PM2.5 on a 1 x 1 km(2) resolution for a 13 state region in the Northeastern USA for the years 2000-2015 using satellite-derived aerosol optical depth and implement a recursive feature selection to develop a parsimonious model. We demonstrate excellent predictions of withheld observations but also contrast an RMSE of 3.11 mu g/m(3) in our spatial cross-validation withholding nearby sites versus an overfit RMSE of 2.10 mu g/m(3) using a more conventional random ten-fold splitting of the dataset. As the field of exposure science moves forward with the use of advanced machine-learning approaches for spatiotemporal modeling of air pollutants, our results show the importance of addressing data leakage in training, overfitting to spatiotemporal structure, and the impact of the predominance of ground monitoring sites in dense urban sub-networks on model evaluation. The strengths of our resultant modeling approach for exposure in epidemiologic studies of PM2.5 include improved efficiency, parsimony, and interpretability with robust validation while still accommodating complex spatiotemporal relationships.
机译:在空间和时间中重建细颗粒物质(PM2.5)的分布,甚至远离地面监测位点,是对PM2.5健康影响的流行病学分析的重要曝光科学贡献。灵活的预测统计方法已经证明了与其他预测因子的卫星观察的集成,但这些算法易于过度地过度地过度灌注训练数据集的时空结构。我们介绍了一种使用机器学习方法预测PM2.5的新方法,并评估预测模型,以实现他们以前没有以前可用的预测。我们应用极端梯度提升(XGBoost)建模,以在2000 - 2015年东北部13号国家地区的1×1 km(2)分辨率的每日PM2.5,使用卫星衍生的气溶胶光学深度并实施a递归特征选择开发一个解析模型。我们展示了对隐藏观测的优异预测,但在我们的空间交叉验证中扣留了3.11 mu g / m(3)的RMSE,使用更传统的随机10-备用3.11 mu g / m(3)与2.10 mu g / m(3)的综合Rmse。折叠数据集的分裂。随着曝光科学领域的推进随着使用先进的机器学习方法来使用先进的机器学习方法,用于空气污染物的时空建模,我们的结果表明,解决训练中的数据泄漏,过度拟合到时空结构的重要性,以及地面监测的主要影响的影响密集城市群系的模型评估网站。 PM2.5流行病学研究暴露的所得建模方法的优势包括提高效率,分析和具有鲁棒验证的可解释性,同时仍然适应复杂的时空关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号