首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Efficient exploration through active learning for value function approximation in reinforcement learning.
【24h】

Efficient exploration through active learning for value function approximation in reinforcement learning.

机译:通过主动学习对强化学习中的价值函数近似进行有效探索。

获取原文
获取原文并翻译 | 示例
           

摘要

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot.
机译:适当设计采样策略对于在强化学习中获得更好的控制策略非常重要。在本文中,我们首先显示最小二乘策略迭代(LSPI)框架允许我们采用统计主动学习方法进行线性回归。然后,我们提出了一种用于高效勘探的良好采样策略的设计方法,该方法在即时奖励的采样成本很高时特别有用。通过使用击球机器人进行仿真,证明了该方法的有效性,我们将其称为主动策略迭代(API)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号