...
首页> 外文期刊>Frontiers in Neurorobotics >Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer
【24h】

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

机译:自适应基准增强了基于EM的策略搜索:在智能手机平衡器的基于视图的定位任务中进行验证

获取原文

摘要

EM-based policy search methods estimate a lower bound of the expected return from the histories of episodes and iteratively update the policy parameters using the maximum of a lower bound of expected return, which makes gradient calculation and learning rate tuning unnecessary. Previous algorithms like Policy learning by Weighting Exploration with the Returns, Fitness Expectation Maximization, and EM-based Policy Hyperparameter Exploration implemented the mechanisms to discard useless low-return episodes either implicitly or using a fixed baseline determined by the experimenter. In this paper, we propose an adaptive baseline method to discard worse samples from the reward history and examine different baselines, including the mean, and multiples of SDs from the mean. The simulation results of benchmark tasks of pendulum swing up and cart-pole balancing, and standing up and balancing of a two-wheeled smartphone robot showed improved performances. We further implemented the adaptive baseline with mean in our two-wheeled smartphone robot hardware to test its performance in the standing up and balancing task, and a view-based approaching task. Our results showed that with adaptive baseline, the method outperformed the previous algorithms and achieved faster, and more precise behaviors at a higher successful rate.
机译:基于EM的策略搜索方法根据情节的历史记录估计预期回报的下限,并使用预期回报的下限的最大值迭代更新策略参数,从而无需进行梯度计算和学习率调整。以前的算法(例如,通过带有收益的加权探索进行策略学习,适应度期望最大化和基于EM的策略超参数探索)实施了机制,以隐式或使用实验者确定的固定基准丢弃无用的低回报事件。在本文中,我们提出了一种自适应基线方法,该方法可以从奖励历史中丢弃较差的样本,并检查不同的基线,包括平均值和平均值的SD倍数。摆摆和车杆平衡以及两轮智能手机机器人的站立和平衡的基准任务的模拟结果显示出改进的性能。我们进一步在两轮智能手机机器人硬件中实施均值自适应基线,以测试其在站立和平衡任务以及基于视图的接近任务中的性能。我们的结果表明,在自适应基线下,该方法优于以前的算法,并以更高的成功率实现了更快,更精确的行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号