首页> 外文会议>Chinese Automation Congress >Effective Policy Adjustment via Meta-Learning for Complex Manipulation Tasks
【24h】

Effective Policy Adjustment via Meta-Learning for Complex Manipulation Tasks

机译:通过元学习进行复杂操作任务的有效策略调整

获取原文

摘要

The ability of adjusting policy is the key to learning decision making when completing complex manipulation tasks for agents. To solve this problem with the consideration of both exploration and exploitation, we propose a novel deep reinforcement learning algorithm by combining the Hindsight Experience Replay (HER) with the Model-Agnostic Meta-Learning (MAML). To solve the complex manipulation tasks, HER could provide a relatively effective exploration by converting the single-goal task to the multiple goals in such an environment where rewards are sparse and binary, enhancing the ability to search better policies according to not only the successful the transition trajectories but also the failures, and the MAML could promote the ability of exploitation, which means the proposed algorithm could learn faster and adjust the policy model from limited experience within few iterations. Plenty of simulation results on the complex tasks of manipulating objects with a robotic arm have been done, and results show that HER integrated with MAML could accelerate fine-tuning for the original policy gradient reinforcement learning with neural network policy, and also improve the performance on the success rate.
机译:调整政策的能力是在为代理完成复杂操作任务时学习决策的关键。为了解决探索和剥削的考虑,我们通过将后敏感经验重放(她)与模型 - 不可知的元学习(MAML)相结合,提出了一种新的深度加强学习算法。为了解决复杂的操纵任务,她可以通过将单个目标任务转换为在这种环境中的多个目标转换为奖励稀疏和二进制的环境,增强根据不仅根据成功搜索更好策略的能力来提供相对有效的探索。过渡轨迹还有故障,而MAML可以促进剥削能力,这意味着所提出的算法可以在很少的迭代之内从有限的经验中学到更快和调整政策模型。已经完成了充足的仿真结果,对具有机器人手臂的操作对象的复杂任务,结果表明,她与MAML集成可以加速与神经网络政策的原始政策梯度强化学习的微调,并提高性能成功率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号