Dead-ends and Secure Exploration in Reinforcement Learning

机译：终止和加固学习安全探索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many interesting applications of reinforcement learning (RL) involve MDPs that include numerous "dead-end" states. Upon reaching a dead-end state, the agent continues to interact with the environment in a dead-end trajectory before reaching an undesired terminal state, regardless of whatever actions are chosen. The situation is even worse when existence of many dead-end states is coupled with distant positive rewards from any initial state (we term this as Bridge Effect). Hence, conventional exploration techniques often incur prohibitively many training steps before convergence. To deal with the bridge effect, we propose a condition for exploration, called security. We next establish formal results that translate the security condition into the learning problem of an auxiliary value function. This new value function is used to cap "any" given exploration policy and is guaranteed to make it secure. As a special case, we use this theory and introduce secure random-walk. We next extend our results to the deep RL settings by identifying and addressing two main challenges that arise. Finally, we empirically compare secure random-walk with standard benchmarks in two sets of experiments including the Atari game of Montezuma's Revenge.

机译：钢筋学习（RL）的许多有趣的应用涉及MDP，包括许多“死端”状态。在达到死端状态时，在达到不希望的终端状态之前，代理程序继续与死端轨迹中的环境相互作用，无论选择是否选择了任何行动。当许多死端状态的存在与来自任何初始状态的远程正奖励相结合时，情况更差（我们将其作为桥梁效应术语）。因此，传统的勘探技术经常在收敛之前产生许多培训步骤。要处理桥梁效果，我们提出了一种探索的条件，称为安全性。我们接下来建立正式结果，将安全条件转化为辅助值函数的学习问题。此新值函数用于缩写“任何”给定的探索政策，并保证使其安全。作为一个特殊情况，我们使用这个理论并介绍安全随机漫步。我们接下来通过识别和解决出现的两个主要挑战，将结果扩展到深度RL设置。最后，我们经验与两套实验中的标准基准进行了明确的准确性，包括蒙特萨的复仇的Atari游戏。

著录项

来源
《International Conference on Machine Learning》|2019年|2854-3547p|共14页
会议地点
作者
Mehdi Fatemi; Shikhar Sharma; Harm van Seijen; Samira Ebrahimi Kahou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. Learning to soar: Resource-constrained exploration in reinforcement learning [J] . Jen Jen Chung, Nicholas R.J. Lawrance, Salah Sukkarieh The International journal of robotics research . 2015,第2期

机译：学会腾飞：强化学习中资源受限的探索
2. Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning [J] . Damien Ernst, Francis Maes, Michael Castronovo, JMLR: Workshop and Conference Proceedings . 2012,第2012期

机译：单轨强化学习的学习探索/开发策略
3. Efficient exploration through active learning for value function approximation in reinforcement learning. [J] . Akiyama T, Hachiya H, Sugiyama M Neural Networks: The Official Journal of the International Neural Network Society . 2010,第5期

机译：通过主动学习对强化学习中的价值函数近似进行有效探索。
4. Dead-ends and Secure Exploration in Reinforcement Learning [C] . Mehdi Fatemi, Shikhar Sharma, Harm van Seijen, International Conference on Machine Learning . 2019

机译：终止和加固学习安全探索
5. Exploration and Safety in Deep Reinforcement Learning [D] . Achiam, Joshua S. 2021

机译：深增强学习中的探索与安全
6. An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor [O] . Xuhan Liu, Kai Ye, Herman W. T. van Vlijmen, 2019

机译：探索策略通过深度强化学习来改善从头配体的多样性：腺苷A2A受体的情况
7. Learning Transferable Domain Priors for Safe Exploration in Reinforcement Learning [O] . Thommen George Karimpanal, Santu Rana, Sunil Gupta, 2020

机译：学习可转让的域名前脚，以便在加固学习中安全探索

Dead-ends and Secure Exploration in Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅