A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks

Wenyan Wu; Robert J. May; Holger R. Maier; Graeme C. Dandy

首页> 外文期刊>Water resources research >A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks

【24h】

A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks

机译：使用人工神经网络比较水资源参数建模数据分割方法的基准化方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data splitting is an important step in the artificial neural network (ANN) development process, whereby the available data are divided into training, testing, and validation subsets to ensure good generalization ability of the model. Considering that only one split of the data is typically used when developing ANN models, data splitting has a significant impact on model performance, depending on which data are allocated to the three subsets. Therefore, it is important to find a data splitting method that consistently results in predictive validation errors that are representative of the predictive errors obtained over the full range of the available data. This paper addresses this issue by introducing a benchmarking approach for comparing different data splitting methods in terms of (1) bias, which is the difference between the expected validation performance over the entire data set and that obtained using a particular data splitting method and (2) variability, which is the spread of the validation errors obtained by repeated implementation of that method. The utility of the proposed approach is assessed on a number of well-known data splitting methods in the context of four water resources ANN modelling problems. The results obtained indicate that the proposed approach for comparing data splitting methods is more representative than the previous approach where a value of zero is used as the predictive performance benchmark, as it can avoid the selection of an over-optimistic data splitting method that under-represents extreme data in the validation set.

机译：数据拆分是人工神经网络（ANN）开发过程中的重要一步，将可用数据分为训练，测试和验证子集，以确保模型具有良好的泛化能力。考虑到在开发ANN模型时通常只使用一个数据拆分，因此数据拆分会对模型性能产生重大影响，具体取决于将哪些数据分配给这三个子集。因此，重要的是找到一种数据分割方法，该方法始终导致预测验证误差，该误差代表在整个可用数据范围内获得的预测误差。本文通过引入一种基准测试方法来解决此问题，该方法用于比较（1）偏差（这是整个数据集的预期验证性能与使用特定数据分割方法获得的验证性能之间的差异）之间的差异（2））可变性，即通过重复实施该方法而获得的验证错误的传播。在四个水资源ANN建模问题的背景下，根据许多众所周知的数据拆分方法对提出的方法的实用性进行了评估。获得的结果表明，与以前的方法（将零值用作预测性能基准）相比，所建议的比较数据拆分方法的方法更具代表性，因为它可以避免选择过于乐观的数据拆分方法，而该方法在代表验证集中的极端数据。

著录项

来源
《Water resources research》 |2013年第11期|7598-7614|共17页
作者
Wenyan Wu; Robert J. May; Holger R. Maier; Graeme C. Dandy;
展开▼
作者单位

School of Civil, Environmental and Mining Engineering, University of Adelaide, Adelaide, SA 5005, Australia;

School of Civil, Environmental and Mining Engineering, University of Adelaide, Adelaide, South Australia, Australia,Veolia Water Asia-Pacific, Technical Department, Shanghai, China;

School of Civil, Environmental and Mining Engineering, University of Adelaide, Adelaide, South Australia, Australia;

School of Civil, Environmental and Mining Engineering, University of Adelaide, Adelaide, South Australia, Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Improved PMI-based input variable selection approach for artificial neural network and other data driven environmental and water resource models [J] . Xuyuan Li, Holger R. Maier, Aaron C. Zecchin Environmental Modelling & Software . 2015,第mara期

机译：改进的基于PMI的人工神经网络和其他数据驱动的环境和水资源模型的输入变量选择方法
2. Approaching the inverse problem of parameter estimation in groundwater models by means of artificial neural networks [J] . E.Zio Progress in Nuclear Energy . 1997,第3期

机译：用人工神经网络求解地下水模型参数估计的反问题。
3. An energy benchmarking model based on artificial neural network method utilizing US Commercial Buildings Energy Consumption Survey (CBECS) database [J] . Melek Yalcintas, U. Aytun Ozturk International journal of energy research . 2007,第4期

机译：使用美国商业建筑能耗调查（CBECS）数据库的基于人工神经网络方法的能源基准模型
4. Artificial Neural Networks Modelling of PID and Model Predictive Controlled Waste Water Treatment Plant Based on the Benchmark Simulation Model No.1 [C] . ESCAPE-19 . 2009

机译：基于基准模拟模型的PID和模型预测控制废水处理厂的人工神经网络建模
5. An artificial-neural-network approach for the identification of saturated turbogenerator parameters based on a coupled finite-element/state-space modeling technique. [D] . Chaudhry, Salman Rafiq. 1994

机译：一种基于耦合有限元/状态空间建模技术的饱和涡轮发电机参数识别的人工神经网络方法。
6. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data [O] . Alexios Koutsoukas, Keith J. Monaghan, Xiaoli Li, 2017

机译：深度学习：研究深度神经网络的超参数并将性能与浅层方法进行生物活性数据建模的比较
7. A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks [O] . Wu W., May R., Maier H., 2013

机译：一种比较数据分裂方法的基准方法，用于使用人工神经网络建模水资源参数

A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks

摘要

著录项

相似文献

相关主题

期刊订阅