...
首页> 外文期刊>The Science of the Total Environment >Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data
【24h】

Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data

机译:机器学习方法作为分析不完整或不规则采样的ra时间序列数据的工具

获取原文
获取原文并翻译 | 示例
           

摘要

Machine learning is a class of statistical techniques which has proven to be a powerful tool for modelling the behaviour of complex systems, in which response quantities depend on assumed controls or predictors in a complicated way.In this paper, as our first purpose, we propose the application of machine learning to reconstruct incomplete or irregularly sampled data of time series indoor radon (222Rn). The physical assumption underlying the modelling is that Rn concentration in the air is controlled by environmental variables such as air temperature and pressure. The algorithms “learn” from complete sections of multivariate series, derive a dependence model and apply it to sections where the controls are available, but not the response (Rn), and in this way complete the Rn series. Three machine learning techniques are applied in this study, namely random forest, its extension called the gradient boosting machine and deep learning. For a comparison, we apply the classical multiple regression in a generalized linear model version. Performance of the models is evaluated through different metrics. The performance of the gradient boosting machine is found to be superior to that of the other techniques.By applying learning machines, we show, as our second purpose, that missing data or periods of Rn series data can be reconstructed and resampled on a regular grid reasonably, if data of appropriate physical controls are available. The techniques also identify to which degree the assumed controls contribute to imputing missing Rn values.Our third purpose, though no less important from the viewpoint of physics, is identifying to which degree physical, in this case environmental variables, are relevant as Rn predictors, or in other words, which predictors explain most of the temporal variability of Rn. We show that variables which contribute most to the Rn series reconstruction, are temperature, relative humidity and day of the year. The first two are physical predictors, while “day of the year” is a statistical proxy or surrogate for missing or unknown predictors.
机译:机器学习是一类统计技术,已被证明是用于建模复杂系统行为的强大工具,其中响应量以复杂的方式取决于假定的控制或预测变量。本文提出的第一个目的是,我们提出机器学习在重建室内时间ra(222Rn)的不完整或不规则采样数据中的应用。建模的物理假设是,空气中Rn的浓度受环境变量(如空气温度和压力)控制。该算法从多元系列的完整部分中“学习”,得出依赖关系模型,并将其应用于有控件但没有响应(Rn)的部分,从而完成Rn系列。本研究中应用了三种机器学习技术,即随机森林,其扩展称为梯度提升机器和深度学习。为了进行比较,我们在广义线性模型版本中应用经典多元回归。模型的性能通过不同的指标进行评估。发现梯度提升机的性能优于其他技术。通过应用学习机,我们证明了我们的第二个目的是,可以重建缺失的数据或Rn系列数据的周期并在常规网格上重新采样如果有适当的物理控制数据可用,则是合理的。这些技术还可以识别假定的控制因素在何种程度上有助于估算缺失的Rn值。尽管从物理学的角度来看同样重要,但我们的第三个目的是确定物理程度(在这种情况下为环境变量)与Rn预测值相关,换句话说,哪些预测因子可以解释Rn的大部分时间变异性。我们表明,对Rn系列重建影响最大的变量是温度,相对湿度和一年中的一天。前两个是物理预测变量,而“一年中的某天”是缺失或未知预测变量的统计替代或替代。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号