首页> 美国卫生研究院文献>Frontiers in Neurorobotics >Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer
【2h】

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

机译:自适应基准增强了基于EM的策略搜索:在智能手机平衡器的基于视图的定位任务中进行验证

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

EM-based policy search methods estimate a lower bound of the expected return from the histories of episodes and iteratively update the policy parameters using the maximum of a lower bound of expected return, which makes gradient calculation and learning rate tuning unnecessary. Previous algorithms like Policy learning by Weighting Exploration with the Returns, Fitness Expectation Maximization, and EM-based Policy Hyperparameter Exploration implemented the mechanisms to discard useless low-return episodes either implicitly or using a fixed baseline determined by the experimenter. In this paper, we propose an adaptive baseline method to discard worse samples from the reward history and examine different baselines, including the mean, and multiples of SDs from the mean. The simulation results of benchmark tasks of pendulum swing up and cart-pole balancing, and standing up and balancing of a two-wheeled smartphone robot showed improved performances. We further implemented the adaptive baseline with mean in our two-wheeled smartphone robot hardware to test its performance in the standing up and balancing task, and a view-based approaching task. Our results showed that with adaptive baseline, the method outperformed the previous algorithms and achieved faster, and more precise behaviors at a higher successful rate.
机译:基于EM的策略搜索方法根据情节的历史记录估计预期回报的下限,并使用预期回报的下限的最大值迭代更新策略参数,从而无需进行梯度计算和学习率调整。以前的算法,例如通过带有收益的加权探索进行策略学习,适应度期望最大化和基于EM的策略超参数探索,都实施了隐式或使用实验者确定的固定基准丢弃无用的低回报事件的机制。在本文中,我们提出了一种自适应基线方法,该方法可以从奖励历史中丢弃较差的样本,并检查不同的基线,包括平均值和平均值的SD倍数。摆摆和车杆平衡以及两轮智能手机机器人的站立和平衡的基准任务的模拟结果显示出改进的性能。我们进一步在两轮智能手机机器人硬件中实施均值自适应基线,以测试其在站立和平衡任务以及基于视图的接近任务中的性能。我们的结果表明,在自适应基准下,该方法优于以前的算法,并以更高的成功率实现了更快,更精确的行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号