...
首页> 外文期刊>Journal of Hydrology >A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA
【24h】

A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA

机译:美国加利福尼亚中央谷地地下水硝酸盐模型的统计学习框架

获取原文
获取原文并翻译 | 示例

摘要

We used a statistical learning framework to evaluate the ability of three machine-learning methods to predict nitrate concentration in shallow groundwater of the Central Valley, California: boosted regression trees (BRT), artificial neural networks (ANN), and Bayesian networks (BN). Machine learning methods can learn complex patterns in the data but because of overfitting may not generalize well to new data. The statistical learning framework involves cross-validation (CV) training and testing data and a separate hold-out data set for model evaluation, with the goal of optimizing predictive performance by controlling for model overfit. The order of prediction performance according to both CV testing R-2 and that for the hold-out data set was BRT > BN > ANN. For each method we identified two models based on CV testing results: that with maximum testing R-2 and a version with R-2 within one standard error of the maximum (the 1SE model). The former yielded CV training R-2 values of 0.94-1.0. Cross-validation testing le values indicate predictive performance, and these were 0.22-0.39 for the maximum R-2 models and 0.19-0.36 for the 1SE models. Evaluation with hold-out data suggested that the 1SE BRT and ANN models predicted better for an independent data set compared with the maximum R-2 versions, which is relevant to extrapolation by mapping. Scatterplots of predicted vs. observed hold-out data obtained for final models helped identify prediction bias, which was fairly pronounced for ANN and BN. Lastly, the models were compared with multiple linear regression (MLR) and a previous random forest regression (RFR) model. Whereas BRT results were comparable to RFR, MLR had low hold-out R-2 (0.07) and explained less than half the variation in the training data. Spatial patterns of predictions by the final, 1SE BRT model agreed reasonably well with previously observed patterns of nitrate occurrence in groundwater of the Central Valley. Published by Elsevier B.V.
机译:我们使用统计学习框架评估了三种机器学习方法预测加利福尼亚中央山谷浅层地下水中硝酸盐浓度的能力:增强回归树(BRT),人工神经网络(ANN)和贝叶斯网络(BN) 。机器学习方法可以学习数据中的复杂模式,但是由于过度拟合可能无法很好地推广到新数据。统计学习框架涉及交叉验证(CV)训练和测试数据以及用于模型评估的单独的保留数据集,其目的是通过控制模型过拟合来优化预测性能。根据CV测试R-2和保留数据集的预测性能顺序为BRT> BN> ANN。对于每种方法,我们基于CV测试结果确定了两个模型:具有最大测试R-2的模型和具有最大误差的一个标准误差的R-2版本(1SE模型)。前者的CV训练R-2值为0.94-1.0。交叉验证测试le值表示预测性能,对于最大的R-2模型,这些值为0.22-0.39,对于1SE模型,这些值为0.19-0.36。对保留数据的评估表明,与最大R-2版本相比,1SE BRT和ANN模型对于独立数据集的预测更好,这与通过映射进行推断有关。从最终模型获得的预测保持力数据与观察到的保持数据的散点图有助于识别预测偏差,这对于ANN和BN相当明显。最后,将模型与多元线性回归(MLR)和先前的随机森林回归(RFR)模型进行了比较。尽管BRT的结果与RFR相当,但是MLR的R-2保持率很低(0.07),并解释了训练数据变化的不到一半。最终的1SE BRT模型进行的预测的空间格局与先前观察到的中央山谷地下水中硝酸盐的分布格局相当吻合。由Elsevier B.V.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号