首页> 外文会议>IFAC Conference on Intelligent Control and Automation Sciences >Policy Derivation Methods for Critic-Only Reinforcement Learning in Continuous Action Spaces
【24h】

Policy Derivation Methods for Critic-Only Reinforcement Learning in Continuous Action Spaces

机译:在连续行动空间中批评的批评加强学习的政策推导方法

获取原文

摘要

State-of-the-art critic-only reinforcement learning methods can deal with a small discrete action space. The most common approach to real-world problems with continuous actions is to discretize the action space. In this paper a method is proposed to derive a continuous-action policy based on a value function that has been computed for discrete actions by using any known algorithm such as value iteration. Several variants of the policy-derivation algorithm are introduced and compared on two continuous state-action benchmarks: double pendulum swing-up and 3D mountain car.
机译:最先进的批评批评学习方法可以处理小的离散动作空间。具有持续动作的现实问题最常见的方法是使动作空间分开。在本文中,提出了一种方法,用于基于使用任何已知算法(例如价值迭代)为离散动作计算的值函数的连续动作策略。介绍了两种连续的状态动作基准和3D山地汽车的两种连续状态动作基准和3D山地汽车的若干策略推导算法的若干变体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号