...
首页> 外文期刊>Journal of Cleaner Production >Identification of high impact factors of air quality on a national scale using big data and machine learning techniques
【24h】

Identification of high impact factors of air quality on a national scale using big data and machine learning techniques

机译:利用大数据和机器学习技术在全国范围内识别空气质量的高影响因素

获取原文
获取原文并翻译 | 示例
           

摘要

To effectively control and prevent air pollution, it is necessary to study the influential factors of air quality. A number of previous studies have explored the relationships between air pollution and related factors. However, the methods currently used either cannot well address the multicollinearity problem or fail to explain the importance of the influential factors. Moreover, most of the existing literature limited their studied area in a city or a small region and studied factors in one aspect. There is a lack of studies that analyze the influential factors from the perspective of a country or take into consideration multiple variables. To fill the research gap, this paper proposes a multivariate analysis in the national scale to investigate the most important factors of air quality. In order to study as much influential factors as possible, 171 features ranging from environmental, demographical, economic, meteorological, and energy, were collected and analyzed. To tackle such a "big data" problem, a non-linear machine learning algorithm namely Extreme Gradient Boosting (XGBoost) is utilized to model the relationship and measure the variable importance. Geographical Information System (GIS) is employed to preprocess the diversified variables and visualize the results. Performance of XGBoost is compared with other models and its parameters are tuned using Bayesian Optimization. Experimental results of a case study in the U.S. show that our methodology framework can effectively uncover the important factors of air quality. Six kinds of factors are found to have the largest impact on air quality. Practical suggestions are also proposed from the six aspects to control and prevent air pollution. (C) 2019 Elsevier Ltd. All rights reserved.
机译:为了有效控制和防止空气污染,有必要研究影响空气质量的因素。先前的许多研究已经探索了空气污染与相关因素之间的关系。但是,当前使用的方法要么不能很好地解决多重共线性问题,要么无法解释影响因素的重要性。此外,大多数现有文献将其研究区域限制在城市或小区域中,并且从一个方面研究了因素。缺乏从一个国家的角度分析影响因素或考虑多个变量的研究。为了填补研究空白​​,本文提出了一项在全国范围内进行的多元分析,以调查空气质量的最重要因素。为了研究尽可能多的影响因素,收集并分析了171种特征,包括环境,人口,经济,气象和能源。为了解决这样的“大数据”问题,非线性机器学习算法,即极端梯度提升(XGBoost),被用来对这种关系建模并测量变量的重要性。地理信息系统(GIS)用于预处理各种变量并可视化结果。将XGBoost的性能与其他模型进行比较,并使用贝叶斯优化对它的参数进行调整。在美国进行的一项案例研究的实验结果表明,我们的方法框架可以有效地揭示空气质量的重要因素。发现六种因素对空气质量的影响最大。还从六个方面提出了控制和预防空气污染的实用建议。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Journal of Cleaner Production》 |2020年第1期|118955.1-118955.13|共13页
  • 作者

  • 作者单位

    Hong Kong Univ Sci & Technol Dept Civil & Environm Engn Hong Kong Peoples R China|Big Bay Innovat Res & Dev Ltd Dept Res & Dev Hong Kong Peoples R China;

    Big Bay Innovat Res & Dev Ltd Dept Res & Dev Hong Kong Peoples R China;

    Hong Kong Univ Sci & Technol Dept Civil & Environm Engn Hong Kong Peoples R China;

    City Univ Hong Kong Dept Architecture & Civil Engn Hong Kong Peoples R China;

    Shenzhen Univ Coll Civil Engn Shenzhen Guangdong Peoples R China;

    Hong Kong Univ Sci & Technol Sch Engn Hong Kong Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Air quality index; Big data; GIS; National scale; Variable importance; XGBoost;

    机译:空气质量指数;大数据;地理信息系统国家规模;可变的重要性;XGBoost;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号