A Linear Online Guided Policy Search Algorithm

机译：线性在线指导策略搜索算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In reinforcement learning (RL), the guided policy search (GPS), a variant of policy search method, can encode the policy directly as well as search for optimal solutions in the policy space. Even though this algorithm is provided with asymptotic local convergence guarantees, it can not work in a online way for conducting tasks in complex environments since it is trained with a batch manner which requires that all of the training samples should be given at the same time. In this paper, we propose an online version for GPS algorithm, which can learn policies incrementally without complete knowledge of initial positions for training. The experiments witness its efficacy on handling sequentially arriving training samples in a peg insertion task.

机译：在钢筋学习（RL）中，指导策略搜索（GPS）是一种策略搜索方法的变体，可以直接编码策略，以及搜索策略空间中的最佳解决方案。尽管该算法具有渐近本地收敛保证的算法，但它不能以在线方式用于在复杂环境中进行任务，因为它具有批量方式，这需要同时给出所有训练样本。在本文中，我们提出了一个用于GPS算法的在线版本，可以在没有完全了解训练的初始立场的情况下逐步学习策略。实验证明了在PEG插入任务中顺序到达训练样本的效果。

著录项

来源
《International Conference on Neural Information Processing》|2017年|930p|共8页
会议地点
作者
Biao Sun; Fangzhou Xiong; Zhiyong Liu; Xu Yang; Hong Qiao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP183-53;
关键词
Reinforcement learning; Policy search; Online learning;

机译：强化学习;政策搜索;在线学习;

相似文献

外文文献
中文文献
专利

1. 复合非线性反馈控制方法在线性和非线性系统中的应用 [J] . H Ebrahimi MOLLABASHI, A H MAZINAN, H HAMIDI 中南大学学报（英文版） . 2019,第001期
2. Online policy iterative-based H-infinity optimization algorithm for a class of nonlinear systems [J] . He Shuping, Fang Haiyang, Zhang Maoguang, Information Sciences: An International Journal . 2019,第期

机译：基于在线策略的基于迭代的H-Infinity优化算法，用于一类非线性系统
3. Online adaptive policy iteration based fault-tolerant control algorithm for continuous-time nonlinear tracking systems with actuator failures [J] . Zhang Kun, Zhang Huaguang, Gao Zhiyun, Journal of the Franklin Institute . 2018,第15期

机译：具有执行器故障的连续时间非线性跟踪系统基于在线自适应策略迭代的容错控制算法
4. Online fault compensation control based on policy iteration algorithm for a class of affine non-linear systems with actuator failures [J] . Bo Zhao, Derong Liu, Yuanchun Li Control Theory & Applications, IET . 2016,第15期

机译：基于策略迭代算法的一类带有执行器故障的仿射非线性系统在线故障补偿控制
5. A Linear Online Guided Policy Search Algorithm [C] . Biao Sun, Fangzhou Xiong, Zhiyong Liu, International conference on neural information processing . 2017

机译：线性在线指导策略搜索算法
6. Algorithms for automatic guided vehicle navigation and guidance based on Linear Image Array sensor data [D] . Alley, Daniel Milton 1988

机译：基于线性图像阵列传感器数据的自动制导车辆导航与导航算法
7. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns [O] . Fernando Meyer, Stefan Kurtz, Michael Beckstette 2013

机译：快速的在线和基于索引的算法用于RNA序列结构模式的近似搜索
8. Acreditation Certificate Acreditation No. 21/E/KPT/2018 Article Tools Print this article Indexing metadata How to cite item Email this article Email the author About The Authors Ainun Ramadhani Tri Wahyuni ORCID iD https://orcid.org/0000-0002-4071-3406 Fisheries and Marine Science Faculty, Brawijaya University Indonesia Endang Yuli Herawati Fisheries and Marine Science Faculty, Brawijaya University Indonesia Andi Kurniawan ORCID iD Fisheries and Marine Science Faculty, Brawijaya University Indonesia Abd. Aziz Amin ORCID iD Coastal and Marine Research Center, University of Brawijaya, Indonesia Indonesia About RJLS Aim and Scope Editorial Board Reviewer Acknowledgement Publication Ethics Visitor Statistic Information for Author Author Guidelines (online version) Online Submission Guideline Online Registration Author Fees Download Template User You are logged in as... riris_rjlsub My Profile Log Out Tools Mendeley User Guide Insert Citation using Mendeley Journal Index Visitor Statistic Notifications View (141 new) Manage Journal Content Search Search Scope Browse By Issue By Author By Title Information For Readers For Authors For Librarians Keywords Antioxidant Bali Strait Biogeography CODIS 13 Calamaria DPPH Dyslipidemia Eucheuma cottonii ICP11 Litopenaeus vannamei Macrobrachium rosenbergii Morphology Pandanus Physalis minima RFLP Sardinella lemuru Sperm WSSV birth weight fermentation rats Isolation, and Identification of Diesel Oil Degrading Bacteria in Water Contamination Site and Preliminary analysis with Potential Bacterial Gordonia terrae [O] . Ainun Ramadhani Tri Wahyuni, Endang Yuli Herawati, Andi Kurniawan, 2019

机译：Acreditation证书Acreditation号21 / E / KPT / 2018条工具打印这篇文章索引元数据如何引用文章项目将该文章发送给作者发邮件作者简介艾南·拉马扎尼三Wahyuni ORCID的iD https://orcid.org/0000-0002- 4071-3406渔业和海洋科学学院，Brawijaya大学印尼Endang玉立Herawati渔业和海洋科学学院，Brawijaya大学印度尼西亚安迪Kurniawan ORCID的iD渔业和海洋科学学院，Brawijaya大学印尼阿卜杜勒。阿齐兹阿明ORCID的iD沿海和海洋研究中心，Brawijaya大学，印度尼西亚印度尼西亚关于RJLS目标实现作者作者准则的范围编委会审阅确认出版道德访客统计信息（网络版）在线投稿指南在线注册作者费下载模板用户你是登录为... riris_rjlsub使用Mendeley杂志指数访客统计通知视图（141新）管理期刊内容搜索范围浏览按问题按作者按标题信息供读者对于作者为馆员关键词我的个人资料注销工具Mendeley用户指南插入引文抗氧化剂巴厘海峡生物地理学CODIS 13铁线蛇属DPPH血脂异常麒麟菜cottonii ICP11凡纳滨对虾罗氏沼虾形态露兜小酸浆RFLP黄泽小沙丁鱼精子WSSV出生体重发酵鼠隔离，并在水污染网站和预柴油降解菌的鉴定与潜在的细菌大头terrae liminary分析

A Linear Online Guided Policy Search Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅