...
首页> 外文期刊>Journal of Hydroinformatics >Tree-based fitted Q-iteration for multi-objective Markov decision processes in water resource management
【24h】

Tree-based fitted Q-iteration for multi-objective Markov decision processes in water resource management

机译:水资源管理中多目标马尔可夫决策过程的基于树的拟合Q迭代

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Multi-objective Markov decision processes (MOMDPs) provide an effective modeling framework for decision-making problems involving water systems. The traditional approach is to define many single-objective problems (resulting from different combinations of the objectives), each solvable by standard optimization. This paper presents an approach based on reinforcement learning (RL) that can learn the operating policies for all combinations of objectives in a single training process. The key idea is to enlarge the approximation of the action-value function, which is performed by single-objective RL over the state-action space, to the space of the objectives' weights. The batch-mode nature of the algorithm allows for enriching the training dataset without further interaction with the controlled system. The approach is demonstrated on a numerical test case study and evaluated on a real-world application, the Hoa Binh reservoir, Vietnam. Experimental results on the test case show that the proposed approach (multi-objective fitted Q-iteration; MOFQl) becomes computationally preferable over the repeated application of its single-objective version (fitted Q-iteration; FQi) when evaluating more than five weight combinations. In the Hoa Binh case study, the operating policies computed with MOFQl and FQI have comparable efficiency, while MOFQl provides a continuous
机译:多目标马尔可夫决策过程(MOMDP)为涉及供水系统的决策问题提供了有效的建模框架。传统方法是定义许多单目标问题(由目标的不同组合导致),每个问题都可以通过标准优化来解决。本文提出了一种基于强化学习(RL)的方法,该方法可以在单个培训过程中学习目标的所有组合的操作策略。关键思想是将在状态作用空间上由单目标RL执行的作用值函数的逼近扩大到目标权重的空间。该算法的批处理模式性质允许在不与受控系统进一步交互的情况下丰富训练数据集。该方法在数值测试案例研究中得到证明,并在越南Hoa Binh水库的实际应用中进行了评估。在测试用例上的实验结果表明,在评估五个以上权重组合时,所提出的方法(多目标拟合Q迭代; MOFQ1)在计算上优于单目标版本(拟合Q迭代; FQi)的重复应用。在Hoa Binh案例研究中,使用MOFQ1和FQI计算的操作策略具有可比的效率,而MOFQ1提供了连续的

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号