首页> 外文期刊>Water resources research >A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks
【24h】

A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks

机译:使用人工神经网络比较水资源参数建模数据分割方法的基准化方法

获取原文
获取原文并翻译 | 示例
       

摘要

Data splitting is an important step in the artificial neural network (ANN) development process, whereby the available data are divided into training, testing, and validation subsets to ensure good generalization ability of the model. Considering that only one split of the data is typically used when developing ANN models, data splitting has a significant impact on model performance, depending on which data are allocated to the three subsets. Therefore, it is important to find a data splitting method that consistently results in predictive validation errors that are representative of the predictive errors obtained over the full range of the available data. This paper addresses this issue by introducing a benchmarking approach for comparing different data splitting methods in terms of (1) bias, which is the difference between the expected validation performance over the entire data set and that obtained using a particular data splitting method and (2) variability, which is the spread of the validation errors obtained by repeated implementation of that method. The utility of the proposed approach is assessed on a number of well-known data splitting methods in the context of four water resources ANN modelling problems. The results obtained indicate that the proposed approach for comparing data splitting methods is more representative than the previous approach where a value of zero is used as the predictive performance benchmark, as it can avoid the selection of an over-optimistic data splitting method that under-represents extreme data in the validation set.
机译:数据拆分是人工神经网络(ANN)开发过程中的重要一步,将可用数据分为训练,测试和验证子集,以确保模型具有良好的泛化能力。考虑到在开发ANN模型时通常只使用一个数据拆分,因此数据拆分会对模型性能产生重大影响,具体取决于将哪些数据分配给这三个子集。因此,重要的是找到一种数据分割方法,该方法始终导致预测验证误差,该误差代表在整个可用数据范围内获得的预测误差。本文通过引入一种基准测试方法来解决此问题,该方法用于比较(1)偏差(这是整个数据集的预期验证性能与使用特定数据分割方法获得的验证性能之间的差异)之间的差异(2) )可变性,即通过重复实施该方法而获得的验证错误的传播。在四个水资源ANN建模问题的背景下,根据许多众所周知的数据拆分方法对提出的方法的实用性进行了评估。获得的结果表明,与以前的方法(将零值用作预测性能基准)相比,所建议的比较数据拆分方法的方法更具代表性,因为它可以避免选择过于乐观的数据拆分方法,而该方法在代表验证集中的极端数据。

著录项

  • 来源
    《Water resources research》 |2013年第11期|7598-7614|共17页
  • 作者单位

    School of Civil, Environmental and Mining Engineering, University of Adelaide, Adelaide, SA 5005, Australia;

    School of Civil, Environmental and Mining Engineering, University of Adelaide, Adelaide, South Australia, Australia,Veolia Water Asia-Pacific, Technical Department, Shanghai, China;

    School of Civil, Environmental and Mining Engineering, University of Adelaide, Adelaide, South Australia, Australia;

    School of Civil, Environmental and Mining Engineering, University of Adelaide, Adelaide, South Australia, Australia;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号