...
首页> 外文期刊>Hydrology and Earth System Sciences Discussions >Identifying rainfall-runoff events in discharge time series: a data-driven method based on information theory
【24h】

Identifying rainfall-runoff events in discharge time series: a data-driven method based on information theory

机译:识别放电时间序列中的降雨径流事件:基于信息理论的数据驱动方法

获取原文

摘要

In this study, we propose a data-driven approach for automatically identifying rainfall-runoff events in discharge time series. The core of the concept is to construct and apply discrete multivariate probability distributions to obtain probabilistic predictions of each time step that is part of an event. The approach permits any data to serve as predictors, and it is non-parametric in the sense that it can handle any kind of relation between the predictor(s) and the target. Each choice of a particular predictor data set is equivalent to formulating a model hypothesis. Among competing models, the best is found by comparing their predictive power in a training data set with user-classified events. For evaluation, we use measures from information theory such as Shannon entropy and conditional entropy to select the best predictors and models and, additionally, measure the risk of overfitting via cross entropy and Kullback–Leibler divergence. As all these measures are expressed in “bit”, we can combine them to identify models with the best tradeoff between predictive power and robustness given the available data. We applied the method to data from the Dornbirner Ach catchment in Austria, distinguishing three different model types: models relying on discharge data, models using both discharge and precipitation data, and recursive models, i.e., models using their own predictions of a previous time step as an additional predictor. In the case study, the additional use of precipitation reduced predictive uncertainty only by a small amount, likely because the information provided by precipitation is already contained in the discharge data. More generally, we found that the robustness of a model quickly dropped with the increase in the number of predictors used (an effect well known as the curse of dimensionality) such that, in the end, the best model was a recursive one applying four predictors (three standard and one recursive): discharge from two distinct time steps, the relative magnitude of discharge compared with all discharge values in a surrounding 65h time window and event predictions from the previous time step. Applying the model reduced the uncertainty in event classification by 77.8%, decreasing conditional entropy from 0.516 to 0.114?bits. To assess the quality of the proposed method, its results were binarized and validated through a holdout method and then compared to a physically based approach. The comparison showed similar behavior of both models (both with accuracy near 90%), and the cross-validation reinforced the quality of the proposed model. Given enough data to build data-driven models, their potential lies in the way they learn and exploit relations between data unconstrained by functional or parametric assumptions and choices. And, beyond that, the use of these models to reproduce a hydrologist's way of identifying rainfall-runoff events is just one of many potential applications.
机译:在本研究中,我们提出了一种数据驱动方法,用于在放电时间序列中自动识别降雨径流事件。该概念的核心是构造和应用离散多变量概率分布,以获得每个时间步骤的概率预测。该方法允许任何数据用作预测器,并且它是非参数,即它可以处理预测器和目标之间的任何类型的关系。每个特定预测器数据集的每个选择相当于制定模型假设。在竞争模型中,通过将它们的预测电力与用户分类事件设置的训练数据集进行比较来找到最好的。为了评估,我们使用来自信息理论的措施,例如香农熵和条件熵,以选择最佳的预测因子和模型,另外,通过交叉熵和耐拳 - 莱布勒分歧来衡量过度装备的风险。由于所有这些措施都以“位”表示,我们可以将它们结合起来识别具有在可用数据的预测功率和鲁棒性之间具有最佳权衡的模型。我们将该方法应用于奥地利的Dornbirner ACH集水区的数据,区分了三种不同的型号:依赖于放电数据的模型,使用放电和降水数据的模型以及递归模型,即使用他们自己预测的模型的模型作为额外的预测因子。在案例研究中,降水的额外使用仅通过少量降低预测性不确定性,这可能是因为通过降水提供的信息已经包含在放电数据中。更一般地说,我们发现模型的稳健性随着所使用的预测器数量的增加而迅速下降(一种效果众所周知的诅咒),使得最终是最佳模型是应用四个预测器的递归(三个标准和一个递归):从两个不同的时间步骤放电,与周围的65h时间窗口中的所有放电值相比的相对幅度和从前一步步骤中的事件预测。应用模型将事件分类的不确定性减少77.8%,减少有条件熵从0.516到0.114?比特。为了评估所提出的方法的质量,其结果是通过阻止方法进行二元化和验证,然后与物理基于方法进行比较。比较显示了两种型号的类似行为(两者都在90%附近的精度),交叉验证加强了所提出的模型的质量。给出足够的数据来构建数据驱动的模型,他们的潜力在于他们在学习和利用功能或参数假设和选项的不受约束的数据之间学习和利用关系的方式。而且,除此之外,使用这些模型来重现水平理科学家的识别降雨事件的方式只是许多潜在应用中的一种。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号