An Active Exploration Method for Data Efficient Reinforcement Learning

Dongfang Zhao; Jiafeng Liu; Rui Wu; Dansong Cheng; Xianglong Tang

首页> 外文期刊>International journal of applied mathematics and computer science >An Active Exploration Method for Data Efficient Reinforcement Learning

【24h】

An Active Exploration Method for Data Efficient Reinforcement Learning

机译：数据有效增强学习的积极探索方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCO algorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results.

机译：强化学习（RL）构成了无需先验知识的控制动态系统的有效方法。 RL中最重要和最困难的问题之一是提高数据效率。用于学习控制的概率推断（Pilco）是一种最先进的数据有效框架，它使用高斯过程来模拟动态系统。但是，它只专注于优化累积奖励，并且不考虑动态模型的准确性，这是控制器学习的重要因素。为了进一步提高Pilco的数据效率，我们提出了利用信息熵来描述样本的主动探索版本（Aepilco）。在策略评估阶段，我们将信息熵标准纳入了长期样本预测。通过信息丰富的策略评估功能，我们的算法在策略改进阶段获得信息策略参数。使用实际执行中的策略参数生成一个信息集;这有助于学习准确的动态模型。因此，通过基于信息熵标准通过主动选择信息样本来学习准确的动态模型，助归算法通过学习准确的动态模型来提高数据效率。我们展示了涉及推车杆，柱柱，双摆和推车双摆的若干具有挑战性控制器问题的提出算法的有效性和效率。与Pilco相比，Aepilco算法可以使用较少的试验来学习控制器。通过理论分析和实验结果验证了这一点。

著录项

来源
《International journal of applied mathematics and computer science》 |2019年第2期|共12页
作者
Dongfang Zhao; Jiafeng Liu; Rui Wu; Dansong Cheng; Xianglong Tang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
reinforcement learninginformation entropyPILCOdata efficiency;

机译：强化学习信息entropypilcodata效率;

相似文献

外文文献
中文文献
专利

1. AN ACTIVE EXPLORATION METHOD FOR DATA EFFICIENT REINFORCEMENT LEARNING [J] . DONGFANG ZHAO, JIAFENG LIU, Rui WU, International Journal of Applied Mathematics and Computer Science . 2019,第2期

机译：数据有效加固学习的主动探索方法
2. Efficient exploration through active learning for value function approximation in reinforcement learning. [J] . Akiyama T, Hachiya H, Sugiyama M Neural Networks: The Official Journal of the International Neural Network Society . 2010,第5期

机译：通过主动学习对强化学习中的价值函数近似进行有效探索。
3. PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning [J] . Li Shilei, Li Meng, Su Jiongming, ACM transactions on intelligent systems and technology . 2021,第3期

机译：PP-PG：将参数扰动与政策梯度方法相结合，为深加固学习中有效和高效的探索
4. Data-Efficient Reinforcement Learning Using Active Exploration Method [C] . Dongfang Zhao, Jiafeng Liu, Rui Wu, International conference on neural information processing;Annual conference of Asia-Pacific Neural Network Society . 2018

机译：使用主动探索方法的数据有效强化学习
5. Scalable and Data Efficient Deep Reinforcement Learning Methods for Healthcare Applications [D] . Saripalli, Venkata Ratnam. 2019

机译：用于医疗应用的可扩展和数据有效的深度增强学习方法
6. Active exploration is important for reinforcement learning of interval timing [O] . Osamu Shouno, Hiroshi Tsujino 2011

机译：积极探索对于强化间隔时间学习很重要
7. Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration [O] . Tingguang Li, Jin Pan, Delong Zhu, 2018

机译：学习中断：高效勘探的分层深度加强学习框架

An Active Exploration Method for Data Efficient Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅