首页> 外文期刊>Hydrology and Earth System Sciences >Identifying rainfall-runoff events in discharge time series: a data-driven method based on information theory
【24h】

Identifying rainfall-runoff events in discharge time series: a data-driven method based on information theory

机译:识别排水时间序列中的降雨径流事件:基于信息论的数据驱动方法

获取原文
获取外文期刊封面目录资料

摘要

In this study, we propose a data-driven approach for automatically identifying rainfall-runoff events in discharge time series. The core of the concept is to construct and apply discrete multivariate probability distributions to obtain probabilistic predictions of each time step that is part of an event. The approach permits any data to serve as predictors, and it is non-parametric in the sense that it can handle any kind of relation between the predictor(s) and the target. Each choice of a particular predictor data set is equivalent to formulating a model hypothesis. Among competing models, the best is found by comparing their predictive power in a training data set with user-classified events. For evaluation, we use measures from information theory such as Shannon entropy and conditional entropy to select the best predictors and models and, additionally, measure the risk of overfitting via cross entropy and Kullback–Leibler divergence. As all these measures are expressed in “bit”, we can combine them to identify models with the best tradeoff between predictive power and robustness given the available data. We applied the method to data from the Dornbirner Ach catchment in Austria, distinguishing three different model types: models relying on discharge data, models using both discharge and precipitation data, and recursive models, i.e., models using their own predictions of a previous time step as an additional predictor. In the case study, the additional use of precipitation reduced predictive uncertainty only by a small amount, likely because the information provided by precipitation is already contained in the discharge data. More generally, we found that the robustness of a model quickly dropped with the increase in the number of predictors used (an effect well known as the curse of dimensionality) such that, in the end, the best model was a recursive one applying four predictors (three standard and one recursive): discharge from two distinct time steps, the relative magnitude of discharge compared with all discharge values in a surrounding 65?h time window and event predictions from the previous time step. Applying the model reduced the uncertainty in event classification by 77.8?%, decreasing conditional entropy from 0.516 to 0.114?bits. To assess the quality of the proposed method, its results were binarized and validated through a holdout method and then compared to a physically based approach. The comparison showed similar behavior of both models (both with accuracy near 90?%), and the cross-validation reinforced the quality of the proposed model. Given enough data to build data-driven models, their potential lies in the way they learn and exploit relations between data unconstrained by functional or parametric assumptions and choices. And, beyond that, the use of these models to reproduce a hydrologist's way of identifying rainfall-runoff events is just one of many potential applications.
机译:在这项研究中,我们提出了一种数据驱动的方法来自动识别排放时间序列中的降雨径流事件。该概念的核心是构造和应用离散的多元概率分布,以获得作为事件一部分的每个时间步的概率预测。该方法允许任何数据用作预测变量,并且在可以处理预测变量与目标之间的任何类型的意义上,它是非参数的。特定预测变量数据集的每种选择都等同于制定模型假设。在竞争模型中,最好的方法是将训练数据集中的预测能力与用户分类的事件进行比较,从而找到最佳方法。为了进行评估,我们使用了诸如Shannon熵和条件熵之类的信息理论中的度量,以选择最佳的预测变量和模型,此外,还通过交叉熵和Kullback-Leibler散度来度量过拟合的风险。由于所有这些度量均以“位”表示,因此我们可以将它们结合起来,以在给定可用数据的情况下,在预测能力和鲁棒性之间找到最佳折衷的模型。我们将该方法应用于奥地利Dornbirner Ach流域的数据,区分了三种不同的模型类型:依赖排放数据的模型,使用排放和降水数据的模型以及递归模型,即使用自己对先前时间步长的预测的模型作为额外的预测指标。在案例研究中,降水的额外使用只能将预测不确定性降低一小部分,这很可能是因为降水提供的信息已经包含在流量数据中。更笼统地说,我们发现模型的健壮性随着所使用的预测变量数量的增加而迅速下降(这种效应被称为维数诅咒),最终,最佳模型是使用四个预测变量的递归模型(三个标准和一个递归):从两个不同的时间步进行放电,将放电的相对大小与周围65?h时间窗口中的所有放电值进行比较,并根据前一时间步进行事件预测。应用该模型将事件分类的不确定性降低了77.8%,使条件熵从0.516位降低到0.114位。为了评估所提出方法的质量,将其结果二值化并通过保留方法进行验证,然后将其与基于物理的方法进行比较。比较结果表明两种模型的行为相似(两者的准确度均接近90%),并且交叉验证增强了所提出模型的质量。如果有足够的数据来构建数据驱动的模型,它们的潜力就在于它们学习和利用不受功能或参数假设和选择约束的数据之间关系的方式。而且,除此之外,使用这些模型来重现水文学家确定降雨径流事件的方式只是众多潜在应用之一。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号