首页> 外文会议>2011 IEEE International Conference on Systems, Man, and Cybernetics >Optimality principle broken by considering structured plant variation and relevant robust reinforcement learning
【24h】

Optimality principle broken by considering structured plant variation and relevant robust reinforcement learning

机译:考虑结构化植物变异和相关稳健强化学习的最优原则

获取原文

摘要

In a general reinforcement learning problem, a plant (state transition probabilities) is estimated and a learning policy for the estimated plant is applied to a real plant. If there are differences between the estimated plant and the real plant, the obtained policy may not work for the real plant. Therefore, a set of plants with variations is used for learning in order to obtain a robust policy against variations. Bellman's principle of optimality does not hold when the set of plants is used, and a typical dynamic programming algorithm cannot solve the problem. This study shows the reason why the principle of optimality does not hold. It then makes some relaxed problems whose solutions can be obtained. Moreover, this study proposes solutions to learn feasible policies efficiently. The effectiveness of the proposed method is demonstrated by applying to simple examples.
机译:在一般的强化学习问题中,估计工厂(状态转换概率),并将估计的工厂的学习策略应用于实际工厂。如果估计工厂与实际工厂之间存在差异,则所获得的策略可能不适用于实际工厂。因此,使用一组具有变异的植物进行学习,以获得针对变异的鲁棒策略。当使用一组植物时,贝尔曼的最优性原理不成立,并且典型的动态规划算法不能解决该问题。这项研究表明了最优原理不成立的原因。然后,它提出了一些轻松的问题,可以获得解决方案。此外,本研究提出了有效学习可行政策的解决方案。通过应用到简单的例子证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号