Learning Upper-Level Policy using Importance Sampling-based Policy Search Method

机译：使用基于重要性采样的策略搜索方法学习上级策略

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Policy search methods are a successful approach to reinforcement learning. These allow to learn upper-level policies whose main advantage is that these distributions explore directly in the parameter space. The contribution of this paper is to propose an algorithm based on importance sampling methods and local linear regression that uses the samples in an efficient way. In order to get this aim, we propose to include information of all the past samples in the learning process using importance sampling methods. Additionally, we use the gradient direction of the linear local model reward to explore regions where the prediction of the reward could be better.

机译：策略搜索方法是强化学习的成功方法。这些允许学习高级策略，这些策略的主要优点是可以直接在参数空间中探索这些分布。本文的目的是提出一种基于重要性抽样方法和局部线性回归的算法，该算法可以有效地使用样本。为了达到这个目的，我们建议使用重要性抽样方法在学习过程中包括过去所有样本的信息。此外，我们使用线性局部模型奖励的梯度方向来探索奖励预测可能更好的区域。

著录项

来源
《International Conference on Systems and Control》|2018年|188-193|共6页
会议地点
作者
Jose Pastor; Henry Díaz; Leopoldo Armesto; Alicia Esparza; Antonio Sala;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Trajectory; Estimation; Monte Carlo methods; Linear regression; Search methods; Numerical models; Robots;

机译：轨迹;估计;蒙特卡洛方法;线性回归;搜索方法;数值模型;机器人;

相似文献

外文文献
中文文献
专利

1. 学习环境对中国大学生英语学习策略使用的影响：一项基于社会文化视角的对比研究 [J] . 李池利中国应用语言学：英文版 . 2014,第002期
2. Learning in robotic manipulation: The role of dimensionality reduction in policy search methods Comment on "Hand synergies: Integration of robotics and neuroscience for understanding the control of biological and artificial hands" by Marco Santello et al [J] . Ficuciello Fanny, Siciliano Bruno Physics of life reviews . 2016,第Null期

机译：在机器人操纵中学习：降维在策略搜索方法中的作用Marco Santello等人在评论“手的协同作用：机器人和神经科学的融合以理解生物和人工手的控制”时发表了评论。
3. Search methods for optimising reinforcement learning policy functions [J] . Salah Aziz Rana, Malcolm Crowe, Colin Fyfe Computing and Information Systems . 2010,第3期

机译：优化强化学习策略功能的搜索方法
4. Policy learning to reduce inequalities: the search for a coherent Scottish gender mainstreaming policy in a multilevel UK [J] . Cairney Paul, St Denny Emily, Kippin Sean Territory, politics, governance . 2021,第3期

机译：减少不平等的政策：在多级英国寻找一致的苏格兰性别主流化政策
5. Learning Upper-Level Policy using Importance Sampling-based Policy Search Method [C] . Jose Pastor, Henry Díaz, Leopoldo Armesto, International Conference on Systems and Control . 2018

机译：使用基于Importance采样的策略搜索方法学习上级策略
6. Bayesian Methods for Knowledge Transfer and Policy Search in Reinforcement Learning. [D] . Wilson, Aaron. 2012

机译：强化学习中的知识转移和策略搜索的贝叶斯方法。
7. Admission policies and methods at crossroads: a review of medical school admission policies and methods in seven Asian countries [O] . Diantha Soemantri, Indika Karunathilake, Jen-Hung Yang, 2020

机译：交叉路的入学政策和方法：七个亚洲国家的医学入学政策和方法综述
8. Incremental Sampling-based Motion Planners Using Policy Iteration Methods [O] . Arslan, Oktay, Tsiotras, Panagiotis 2016

机译：使用策略迭代的基于增量采样的运动规划器方法
9. Learning to Cooperate in a Search Mission via Policy Search [R] . Martin, D. 2002

机译：通过政策搜索学习合作搜索任务

Learning Upper-Level Policy using Importance Sampling-based Policy Search Method

摘要

著录项

相似文献

相关主题

期刊订阅