Dual REPS: A Generalization of Relative Entropy Policy Search Exploiting Bad Experiences

Adriá Colomé; Carme Torras

首页> 外文期刊>IEEE Transactions on Robotics >Dual REPS: A Generalization of Relative Entropy Policy Search Exploiting Bad Experiences

【24h】

Dual REPS: A Generalization of Relative Entropy Policy Search Exploiting Bad Experiences

机译：双重REPS：利用不良经验进行相对熵策略搜索的概括

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Policy search (PS) algorithms are widely used for their simplicity and effectiveness in finding solutions for robotic problems. However, most current PS algorithms derive policies by statistically fitting the data from the best experiments only. This means that experiments yielding a poor performance are usually discarded or given too little influence on the policy update. In this paper, we propose a generalization of the relative entropy policy search (REPS) algorithm that takes bad experiences into consideration when computing a policy. The proposed approach, named dual REPS (DREPS) following the philosophical interpretation of the duality between good and bad, finds clusters of experimental data yielding a poor behavior and adds them to the optimization problem as a repulsive constraint. Thus, considering that there is a duality between good and bad data samples, both are taken into account in the stochastic search for a policy. Additionally, a cluster with the best samples may be included as an attractor to enforce faster convergence to a single optimal solution in multimodal problems. We first tested our proposed approach in a simulated reinforcement learning setting and found that DREPS considerably speeds up the learning process, especially during the early optimization steps and in cases where other approaches get trapped in between several alternative maxima. Further experiments in which a real robot had to learn a task with a multimodal reward function confirm the advantages of our proposed approach with respect to REPS.

机译：策略搜索（PS）算法因其简单性和有效性而广泛用于查找机器人问题的解决方案。但是，当前大多数PS算法仅通过统计拟合最佳实验中的数据来得出策略。这意味着通常会放弃性能不佳的实验，或者对策略更新的影响太小。在本文中，我们提出了一种相对熵策略搜索（REPS）算法的一般化方法，该算法在计算策略时会考虑不良经验。根据对好与坏之间的对偶性的哲学解释，所提出的方法被称为对偶REPS（DREPS），它发现产生不良行为的实验数据簇，并将它们作为排斥性约束添加到优化问题中。因此，考虑到好数据样本与坏数据样本之间存在双重性，因此在随机搜索策略时要同时考虑两者。此外，可以将具有最佳样本的聚类作为吸引子，以在多峰问题中强制更快地收敛到单个最佳解决方案。我们首先在模拟的强化学习环境中测试了我们提出的方法，发现DREPS大大加快了学习过程，尤其是在早期优化步骤中以及在其他方法陷入多个替代最大值之间的情况下。实际机器人必须学习具有多模式奖励功能的任务的进一步实验证实了我们提出的方法在REPS方面的优势。

著录项

来源
《IEEE Transactions on Robotics》 |2017年第4期|978-985|共8页
作者
Adriá Colomé; Carme Torras;
展开▼
作者单位

Institut de Robótica i Informàtica Industrial, CSIC-UPC, Barcelona, Spain;

Institut de Robótica i Informàtica Industrial, CSIC-UPC, Barcelona, Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering algorithms; Robots; Optimization; Gaussian distribution; Entropy; Learning (artificial intelligence); Probability distribution;

机译：聚类算法;机器人;优化;高斯分布;熵;学习（人工智能）;概率分布;

相似文献

外文文献
中文文献
专利

1. Layered Relative Entropy Policy Search [J] . Izadi Navid Hoseini, Palhang Maziar, Safayani Mehran Knowledge-Based Systems . 2021,第Jula8期

机译：分层相对熵策略搜索
2. Hierarchical Relative Entropy Policy Search [J] . Christian Daniel, Gerhard Neumann, Oliver Kroemer, Journal of machine learning research . 2016,第93期

机译：分层相对熵策略搜索
3. Entanglement entropy, relative entropy and duality [J] . Upamanyu Moitra, Ronak M Soni, Sandip P. Trivedi The journal of high energy physics . 2019,第8期

机译：纠缠熵，相对熵和二元性
4. Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation [C] . Abbas Abdolmaleki, David Simões, Nuno Lau, IEEE International Conference on Autonomous Robot Systems and Competitions . 2016

机译：协方差矩阵自适应的上下文相对熵策略搜索
5. Trans Collective Experience: How Transgender Individuals Search for Primary Care Physicians [D] . Drake, Amanda M. 2019

机译：跨群体经验：跨性别者如何寻找基层医疗医师
6. Experience and generalization in a connectionist model of Mandarin Chinese relative clause processing [O] . Yaling Hsiao, Maryellen C. MacDonald 2013

机译：普通话相关从句处理的连接主义模型的经验与归纳
7. Dual REPS: a generalization of relative entropy policy search exploiting bad experiences [O] . Colomé Figueras, Adrià, Torras, Carme 2017

机译：双重REps：利用不良经历的相对熵策略搜索的推广

Dual REPS: A Generalization of Relative Entropy Policy Search Exploiting Bad Experiences

摘要

著录项

相似文献

相关主题

期刊订阅