...
首页> 外文期刊>Hydrology and Earth System Sciences >Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
【24h】

Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

机译:在流模型中评估基于随机树的集成的预测能力

获取原文
获取原文并翻译 | 示例
           

摘要

Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies-Marina catchment (Singapore) and Canning River (Western Australia)-representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
机译:将随机化方法与整体预测相结合正在成为平衡数据驱动建模的准确性和计算效率的有效选择。在本文中,我们在流建模练习中从准确性,解释能力和计算效率的角度研究了极端随机树(Extra-Trees)的预测能力。 Extra-Trees是一种完全基于树的基于整体的集成方法,(i)减轻了通用性差和传统独立决策树(例如CART)过度拟合的趋势; (ii)计算效率高; (iii)可以推断输入变量的相对重要性,这可能有助于模型的事后物理解释。在两个真实世界的案例研究中分析了Extra-Trees的潜力-滨海集水区(新加坡)和坎宁河(西澳大利亚州),它们代表了两种不同的形态气候环境。评估是针对其他基于树的方法(CART和M5)和参数数据驱动的方法(ANN和多元线性回归)执行的。结果表明,在两个分水岭上,Extra-Trees的性能均达到最佳基准(即M5)的相对最佳,而在大型数据集上采用时,在计算需求方面表现优于其他方法。另外,可以对所提供的输入变量的排名进行物理上有意义的解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号