...
首页> 外文期刊>Atmospheric Measurement Techniques >Machine learning calibration of low-cost NO 2 and PM 10 sensors: non-linear algorithms and their impact on site transferability
【24h】

Machine learning calibration of low-cost NO 2 and PM 10 sensors: non-linear algorithms and their impact on site transferability

机译:低成本2号和PM 10传感器的机器学习校准:非线性算法及其对现场转移性的影响

获取原文
           

摘要

Low-cost air pollution sensors often fail to attain sufficient performance compared with state-of-the-art measurement stations, and they typically require expensive laboratory-based calibration procedures. A repeatedly proposed strategy to overcome these limitations is calibration through co-location with public measurement stations. Here we test the idea of using machine learning algorithms for such calibration tasks using hourly-averaged co-location data for nitrogen dioxide (NO 2 ) and particulate matter of particle sizes smaller than 10? μm (PM 10 ) at three different locations in the urban area of London, UK. We compare the performance of ridge regression, a linear statistical learning algorithm, to two non-linear algorithms in the form of random forest regression (RFR) and Gaussian process regression (GPR). We further benchmark the performance of all three machine learning methods relative to the more common multiple linear regression (MLR). We obtain very good out-of-sample R 2 ?scores (coefficient of determination) 0.7 , frequently exceeding 0.8, for the machine learning calibrated low-cost sensors. In contrast, the performance of MLR is more dependent on random variations in the sensor hardware and co-located signals, and it is also more sensitive to the length of the co-location period. We find that, subject to certain conditions, GPR is typically the best-performing method in our calibration setting, followed by ridge regression and RFR. We also highlight several key limitations of the machine learning methods, which will be crucial to consider in any co-location calibration. In particular, all methods are fundamentally limited in how well they can reproduce pollution levels that lie outside those encountered at training stage. We find, however, that the linear ridge regression outperforms the non-linear methods in extrapolation settings. GPR can allow for a small degree of extrapolation, whereas RFR can only predict values within the training range. This algorithm-dependent ability to extrapolate is one of the key limiting factors when the calibrated sensors are deployed away from the co-location site itself. Consequently, we find that ridge regression is often performing as good as or even better than GPR after sensor relocation. Our results highlight the potential of co-location approaches paired with machine learning calibration techniques to reduce costs of air pollution measurements, subject to careful consideration of the co-location training conditions, the choice of calibration variables and the features of the calibration algorithm.
机译:与最先进的测量站相比,低成本的空气污染传感器通常无法获得足够的性能,并且它们通常需要昂贵的实验室校准程序。一再提出的克服这些限制的策略是通过具有公共测量站的共同位置校准。在这里,我们使用针对二氧化氮(NO 2)的每小时平均的共同位置数据来测试使用机器学习算法的概念(NO 2)和小于10的颗粒尺寸的颗粒状物质。英国伦敦市区的三个不同地点的μm(pm 10)。我们比较脊回归,线性统计学习算法,以随机林回归(RFR)和高斯过程回归(GPR)的形式的两个非线性算法。我们进一步基准测试所有三种机器学习方法的性能相对于更常见的多个线性回归(MLR)。我们获得了非常好的外出r 2?分数(确定系数)& 0.7,经常超过0.8,用于机器学习校准的低成本传感器。相反,MLR的性能更加依赖于传感器硬件和共同定位信号中的随机变化,并且对共同定位周期的长度也更敏感。我们发现,在某些条件下,GPR通常是我们校准设置中最佳的方法,其次是Ridge回归和RFR。我们还突出了机器学习方法的几个关键限制,这在任何共同位置校准中都会是至关重要的。特别是,所有方法都是基本上有限的,他们可以重现在训练阶段遇到的污染水平的污染水平。但是,我们发现线性脊回归优于外推设置中的非线性方法。 GPR可以允许小程度的外推,而RFR只能在训练范围内预测值。这种算法的外推的能力是当校准传感器从共同位置站点部署远离共同位置站点时的关键限制因素之一。因此,我们发现脊回归通常比传感器重新定位后的GPR更好地表现为甚至更好。我们的结果突出了与机器学习校准技术配对的共同定位方法的潜力,以降低空气污染测量的成本,仔细考虑共同定位训练条件,校准变量的选择以及校准算法的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号