首页> 美国卫生研究院文献>Frontiers in Neurorobotics >Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

【2h】

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

机译：自适应基准增强了基于EM的策略搜索：在智能手机平衡器的基于视图的定位任务中进行验证

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

EM-based policy search methods estimate a lower bound of the expected return from the histories of episodes and iteratively update the policy parameters using the maximum of a lower bound of expected return, which makes gradient calculation and learning rate tuning unnecessary. Previous algorithms like Policy learning by Weighting Exploration with the Returns, Fitness Expectation Maximization, and EM-based Policy Hyperparameter Exploration implemented the mechanisms to discard useless low-return episodes either implicitly or using a fixed baseline determined by the experimenter. In this paper, we propose an adaptive baseline method to discard worse samples from the reward history and examine different baselines, including the mean, and multiples of SDs from the mean. The simulation results of benchmark tasks of pendulum swing up and cart-pole balancing, and standing up and balancing of a two-wheeled smartphone robot showed improved performances. We further implemented the adaptive baseline with mean in our two-wheeled smartphone robot hardware to test its performance in the standing up and balancing task, and a view-based approaching task. Our results showed that with adaptive baseline, the method outperformed the previous algorithms and achieved faster, and more precise behaviors at a higher successful rate.

机译：基于EM的策略搜索方法根据情节的历史记录估计预期回报的下限，并使用预期回报的下限的最大值迭代更新策略参数，从而无需进行梯度计算和学习率调整。以前的算法，例如通过带有收益的加权探索进行策略学习，适应度期望最大化和基于EM的策略超参数探索，都实施了隐式或使用实验者确定的固定基准丢弃无用的低回报事件的机制。在本文中，我们提出了一种自适应基线方法，该方法可以从奖励历史中丢弃较差的样本，并检查不同的基线，包括平均值和平均值的SD倍数。摆摆和车杆平衡以及两轮智能手机机器人的站立和平衡的基准任务的模拟结果显示出改进的性能。我们进一步在两轮智能手机机器人硬件中实施均值自适应基线，以测试其在站立和平衡任务以及基于视图的接近任务中的性能。我们的结果表明，在自适应基准下，该方法优于以前的算法，并以更高的成功率实现了更快，更精确的行为。

著录项

期刊名称 Frontiers in Neurorobotics
作者
Jiexin Wang; Eiji Uchibe; Kenji Doya;
展开▼
作者单位

展开▼
年(卷),期 2017(11),-1
年度 2017
页码 1
总页数 15
原文格式 PDF
正文语种
中图分类情报学;
关键词
smartphone robot reinforcement learning EM-based policy search non-linear motor control vision-based control;

机译：智能手机机器人;强化学习;基于EM的策略搜索;非线性电机控制;基于视觉的控制;

相似文献

外文文献
中文文献
专利

1. Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer [J] . Wang, Jiexin Frontiers in Neurorobotics . 2017,第2009期

机译：自适应基准增强了基于EM的策略搜索：在智能手机平衡器的基于视图的定位任务中进行验证
2. EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot [J] . Jiexin Wang, Eiji Uchibe, Kenji Doya Artificial life and robotics . 2016,第1期

机译：基于EM的策略超参数探索：在两轮智能手机机器人站立和平衡中的应用
3. Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning [J] . Sylvain Calinon, Petar Kormushev, Darwin G. Caldwell Robotics and Autonomous Systems . 2013,第4期

机译：通过基于EM的强化学习进行合规技能获取和多最优策略搜索
4. EM-based policy search for learning foraging and mating behaviors [C] . Eiji UCHIBE, Jiexin WANG ロボティクス·メカトロニクス講演会2018 . 2018

机译：基于EM的策略搜索，用于学习觅食和交配行为
5. CONTEXTUAL EFFECTS IN TAX RESEARCH: AN EXPERIMENTAL INVESTIGATION OF ADAPTIVITY AND EXPERT PERFORMANCE IN AN INFORMATION-SEARCH TASK. [D] . MAGRO, ANNE MARIE. 1998

机译：税收研究的语境效应：信息搜索任务中适应性和专家绩效的实验研究。
6. Single-Baseline RTK Positioning Using Dual-Frequency GNSS Receivers Inside Smartphones [O] . Paolo Dabove, Vincenzo Di Pietra 2019

机译：使用智能手机内部的双频GNSS接收器进行单基线RTK定位
7. Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer [O] . Wang, Jiexin, Uchibe, Eiji, Doya, Kenji 2017

机译：自适应基准增强了基于EM的策略搜索：在智能手机平衡器的基于视图的定位任务中进行验证
8. A Validation of the Spatial Variant of the Sternberg Memory Search Task: Search Rate, Response Hand, & Task Interference [R] . Wickens, C. D., Sandry, D., Micalizzi, J. 1981

机译：验证sternberg内存搜索任务的空间变量：搜索速率，响应手和任务干扰

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

摘要

著录项

相似文献

相关主题

期刊订阅