首页> 外文会议>International Conference on Neural Information Processing >A Linear Online Guided Policy Search Algorithm
【24h】

A Linear Online Guided Policy Search Algorithm

机译:线性在线指导策略搜索算法

获取原文

摘要

In reinforcement learning (RL), the guided policy search (GPS), a variant of policy search method, can encode the policy directly as well as search for optimal solutions in the policy space. Even though this algorithm is provided with asymptotic local convergence guarantees, it can not work in a online way for conducting tasks in complex environments since it is trained with a batch manner which requires that all of the training samples should be given at the same time. In this paper, we propose an online version for GPS algorithm, which can learn policies incrementally without complete knowledge of initial positions for training. The experiments witness its efficacy on handling sequentially arriving training samples in a peg insertion task.
机译:在钢筋学习(RL)中,指导策略搜索(GPS)是一种策略搜索方法的变体,可以直接编码策略,以及搜索策略空间中的最佳解决方案。尽管该算法具有渐近本地收敛保证的算法,但它不能以在线方式用于在复杂环境中进行任务,因为它具有批量方式,这需要同时给出所有训练样本。在本文中,我们提出了一个用于GPS算法的在线版本,可以在没有完全了解训练的初始立场的情况下逐步学习策略。实验证明了在PEG插入任务中顺序到达训练样本的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号