Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

机译：对强化学习的本质上有效，稳定和有界的截止政策评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem's importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ensure semiparametric local efficiency if Q-functions are well-specified, but if they are not they can be worse than both IS and SNIS. It also does not enjoy SNIS's inherent stability and boundedness. We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS. On the way, we categorize various properties and classify existing estimators by them. Besides the theoretical guarantees, empirical studies suggest the new estimators provide advantages.

机译：在环境匪徒和强化学习中的违规政策评估（OPE）允许人们在不需要进行探索的情况下评估新的决策政策，这通常是昂贵或以其他方式不可行的。问题的重要性吸引了许多提出的解决方案，包括重要性采样（IS），自归一化为（SNI），以及双重稳健（DR）估计。 DR及其变体确保了Semiparametric本地效率，如果Q函数被批准，但如果它们不是它们比两者都更糟糕。它还不享受SNIS的固有稳定性和界限。我们提出了基于始终高效，SNI和DR的实证可能性的新估算器，并满足与SNIS相同的稳定性和界限属性。在途中，我们对各种属性进行分类并按它们对现有估算进行分类。除了理论保证外，实证研究表明新的估算器提供了优势。

著录项

来源
《Conference on Neural Information Processing Systems》|2020年|p3179-3968|共10页
会议地点
作者
Nathan Kallus; Masatoshi Uehara;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计量学;
关键词

相似文献

外文文献
中文文献
专利

1. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes [J] . Nathan Kallus, Masatoshi Uehara Journal of machine learning research . 2020,第a期

机译：马尔可夫决策过程有效截止政策评估的双重加固学习
2. A perspective on off-policy evaluation in reinforcement learning [J] . Li Lihong Frontiers of computer science in China . 2019,第5期

机译：强化学习中的非政策评估视角
3. A perspective on off-policy evaluation in reinforcement learning [J] . Li Lihong Frontiers of computer science . 2019,第5期

机译：加固学习中的违规评估视角
4. Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning [C] . Nathan Kallus, Masatoshi Uehara Conference on Neural Information Processing Systems . 2020

机译：对强化学习的本质上有效，稳定和有界的截止政策评估
5. Off-Policy Evaluation of Reinforcement Learning in Healthcare [D] . Gottesman, Omer. 2020

机译：医疗保健强度学习的违规政策评估
6. Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces [O] . Stefan Elfwing, Eiji Uchibe, Kenji Doya 2013

机译：基于缩放自由能的增强学习可在高维状态空间中进行健壮和高效的学习
7. Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods [O] . Deirdre Quillen, Eric Jang, Ofir Nachum, 2018

机译：视觉型机器人掌握的深度增强学习：脱助政策方法的模拟比较评价

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅