Efficient and Scalable Exploration via Estimation-Error

机译：通过估计误差进行有效且可扩展的探索

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Exploring efficiently in complex environments is still a challenging problem in reinforcement learning. Recent exploration algorithms based on "optimism in the face of uncertainty" or intrinsic motivation achieved promising performance in sparse reward settings, but they often rely on additional structures which are hard to build in large scale problems. It renders them impractical and hinders the process of combining with reinforcement learning algorithms. Hence, the most state-of-the-art RL algorithms still use the naive action space noise as exploration strategy. In this paper, we model the uncertainty about environment through agent’s ability to estimate the value across state and action space. Then, we parameterize the uncertainty by a neural network and regard it as a reward bonus signal to reward uncertain states. In this way, we generate an end-to-end bonus which can scale to complex environments with less computational cost. In order to prove the effectiveness of our method, we evaluate it on the challenging Atari 2600 games. We observed that our method achieves superior or comparable exploratory performance compared to action space noise in all environments, including environments whose rewards are sparse. The results demonstrate that our exploration method can motivate agent to explore effectively even in complex environments and it generally outperforms the naive action space noise.

机译：在复杂的环境中有效地探索仍然是强化学习中的一个具有挑战性的问题。最近的探索算法基于“面对不确定性”的“乐观”或内在动机在稀疏奖励设置中取得了有希望的性能，但它们往往依赖于额外的结构，这些结构难以在大规模问题中建立。它使它们变得不切实际并阻碍了与加强学习算法相结合的过程。因此，最先进的RL算法仍然使用天真的动作空间噪声作为探索策略。在本文中，我们通过代理能力估算跨状态和动作空间的价值的能力来模拟环境的不确定性。然后，我们通过神经网络参数化不确定性，并将其视为奖励奖金信号以奖励不确定状态。通过这种方式，我们生成端到端奖金，可以扩展到具有较少计算成本的复杂环境。为了证明我们方法的有效性，我们在挑战的Atari 2600游戏中评估它。我们观察到，与所有环境中的动作空间噪声相比，我们的方法达到了优越或相当的探索性能，包括奖励稀疏的环境。结果表明，即使在复杂的环境中，我们的探索方法也可以激励代理商有效地探索，并且它通常优于天真的动作空间噪声。

著录项

来源
《International Joint Conference on Neural Networks》|2019年|1-8|共8页
会议地点
作者
Chuxiong Sun; Rui Wang; Ruiying Li; Jiao Wu; Xiaohui Hu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Exploration of a scalable and power-efficient asynchronous Network-on-Chip with dynamic resource allocation [J] . Effiong Charles, Sassatelli Gilles, Gamatie Abdoulaye Microprocessors and microsystems . 2018,第JULa期

机译：探索具有动态资源分配的可扩展且高效节能的异步片上网络
2. TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections [J] . Minjeong Kim, Kyeongpil Kang, Deokgun Park, IEEE transactions on visualization and computer graphics . 2017,第1期

机译：TopicLens：大型文档集合的高效多级可视主题探索
3. Area-Efficient Instruction Set Extension Exploration with Hardware Design Space Exploration [J] . I-Wei Wu, Chung-Ping Chung, Jean Jyh-Jiun Shann Journal of information science and engineering . 2011,第5期

机译：具有硬件设计空间探索功能的区域高效指令集扩展探索
4. Efficient and Scalable Exploration via Estimation-Error [C] . Chuxiong Sun, Rui Wang, Ruiying Li, International Joint Conference on Neural Networks . 2019

机译：通过估计错误有效和可扩展探索
5. An efficient design space exploration framework to optimize power-efficient heterogeneous many-core multi-threading embedded processor architectures. [D] . Datta, Kushal. 2011

机译：一个有效的设计空间探索框架，用于优化省电的异构多核多线程嵌入式处理器体系结构。
6. The Early Identity Exploration Scale—a measure of initial exploration in breadth during early adolescence [O] . Maria Kłym, Jan Cieciuch -1

机译：早期身份探索量表-衡量青春期早期广度的初步方法
7. Infrastructure for Efficient Exploration of Large Scale Linked Data via Contextual Tag Clouds [O] . Xingjian Zhang, Dezhao Song, Sambhawa Priya, 2015

机译：通过上下文标记云有效探索大规模关联数据的基础设施

Efficient and Scalable Exploration via Estimation-Error

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅