首页> 外文期刊>Artificial life and robotics >EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot
【24h】

EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot

机译:基于EM的策略超参数探索:在两轮智能手机机器人站立和平衡中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

This paper proposes a novel policy search algorithm called EM-based Policy Hyper Parameter Exploration (EPHE) which integrates two reinforcement learning algorithms: Policy Gradient with Parameter Exploration (PGPE) and EM-based Reward-Weighted Regression. Like PGPE, EPHE evaluates a deterministic policy in each episode with the policy parameters sampled from a prior distribution given by the policy hyper parameters (mean and variance). Based on EM-based Reward-Weighted Regression, the policy hyper parameters are updated by reward-weighted averaging so that gradient calculation and tuning of the learning rate are not required. The proposed method is tested in the benchmarks of pendulum swing-up task, cart-pole balancing task and simulation of standing and balancing of a two-wheeled smartphone robot. Experimental results show that EPHE can achieve efficient learning without learning rate tuning even for a task with discontinuities.
机译:本文提出了一种新的策略搜索算法,称为基于EM的策略超参数探索(EPHE),该算法集成了两种强化学习算法:带参数探索的策略梯度(PGPE)和基于EM的奖励加权回归。像PGPE一样,EPHE使用从策略超参数(均值和方差)给出的先验分布中采样的策略参数来评估每个情节中的确定性策略。基于基于EM的奖励加权回归,通过奖励加权平均更新策略超参数,因此不需要梯度计算和学习率调整。该方法在摆摆任务,车杆平衡任务以及两轮智能手机机器人站立和平衡模拟中进行了测试。实验结果表明,即使对于不连续的任务,EPHE无需调整学习速度即可实现高效学习。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号