Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

Wang; Jiexin

首页> 外文期刊>Frontiers in Neurorobotics >Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

【24h】

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

机译：自适应基准增强了基于EM的策略搜索：在智能手机平衡器的基于视图的定位任务中进行验证

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

EM-based policy search methods estimate a lower bound of the expected return from the histories of episodes and iteratively update the policy parameters using the maximum of a lower bound of expected return, which makes gradient calculation and learning rate tuning unnecessary. Previous algorithms like Policy learning by Weighting Exploration with the Returns, Fitness Expectation Maximization, and EM-based Policy Hyperparameter Exploration implemented the mechanisms to discard useless low-return episodes either implicitly or using a fixed baseline determined by the experimenter. In this paper, we propose an adaptive baseline method to discard worse samples from the reward history and examine different baselines, including the mean, and multiples of SDs from the mean. The simulation results of benchmark tasks of pendulum swing up and cart-pole balancing, and standing up and balancing of a two-wheeled smartphone robot showed improved performances. We further implemented the adaptive baseline with mean in our two-wheeled smartphone robot hardware to test its performance in the standing up and balancing task, and a view-based approaching task. Our results showed that with adaptive baseline, the method outperformed the previous algorithms and achieved faster, and more precise behaviors at a higher successful rate.

机译：基于EM的策略搜索方法根据情节的历史记录估计预期回报的下限，并使用预期回报的下限的最大值迭代更新策略参数，从而无需进行梯度计算和学习率调整。以前的算法（例如，通过带有收益的加权探索进行策略学习，适应度期望最大化和基于EM的策略超参数探索）实施了机制，以隐式或使用实验者确定的固定基准丢弃无用的低回报事件。在本文中，我们提出了一种自适应基线方法，该方法可以从奖励历史中丢弃较差的样本，并检查不同的基线，包括平均值和平均值的SD倍数。摆摆和车杆平衡以及两轮智能手机机器人的站立和平衡的基准任务的模拟结果显示出改进的性能。我们进一步在两轮智能手机机器人硬件中实施均值自适应基线，以测试其在站立和平衡任务以及基于视图的接近任务中的性能。我们的结果表明，在自适应基线下，该方法优于以前的算法，并以更高的成功率实现了更快，更精确的行为。

著录项

来源
《Frontiers in Neurorobotics 》 |2017年第2009期| 共15页
作者
Wang; Jiexin;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类神经病学与精神病学 ;
关键词

相似文献

外文文献
中文文献
专利

1. EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot [J] . Jiexin Wang, Eiji Uchibe, Kenji Doya Artificial life and robotics . 2016 ,第1期

机译：基于EM的策略超参数探索：在两轮智能手机机器人站立和平衡中的应用
2. Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning [J] . Sylvain Calinon, Petar Kormushev, Darwin G. Caldwell Robotics and Autonomous Systems . 2013 ,第4期

机译：通过基于EM的强化学习进行合规技能获取和多最优策略搜索
3. Adaptive Task Pools: Efficiently Balancing Large Number of Tasks on Shared-address Spaces [J] . Ralf Hoffmann, Thomas Rauber International journal of parallel programming . 2011 ,第5期

机译：自适应任务池：有效地平衡共享地址空间上的大量任务
4. EM-based policy search for learning foraging and mating behaviors [C] . Eiji UCHIBE, Jiexin WANG ロボティクス·メカトロニクス講演会2018 . 2018

机译：基于EM的策略搜索，用于学习觅食和交配行为
5. CONTEXTUAL EFFECTS IN TAX RESEARCH: AN EXPERIMENTAL INVESTIGATION OF ADAPTIVITY AND EXPERT PERFORMANCE IN AN INFORMATION-SEARCH TASK. [D] . MAGRO, ANNE MARIE. 1998

机译：税收研究的语境效应：信息搜索任务中适应性和专家绩效的实验研究。
6. Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer [O] . Jiexin Wang, Eiji Uchibe, Kenji Doya 2017

机译：自适应基准增强了基于EM的策略搜索：在智能手机平衡器的基于视图的定位任务中进行验证
7. Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer [O] . Wang, Jiexin, Uchibe, Eiji, Doya, Kenji 2017

机译：自适应基准增强了基于EM的策略搜索：在智能手机平衡器的基于视图的定位任务中进行验证
8. A Validation of the Spatial Variant of the Sternberg Memory Search Task: Search Rate, Response Hand, & Task Interference [R] . Wickens, C. D., Sandry, D., Micalizzi, J. 1981

机译：验证sternberg内存搜索任务的空间变量：搜索速率，响应手和任务干扰

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

摘要

著录项

相似文献

相关主题

期刊订阅