More Efficient Off-Policy Evaluation through Regularized Targeted Learning

机译：通过正常化的目标学习更加有效的截止政策评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In particular, we introduce a novel doubly-robust estimator for the OPE problem in RL, based on the Targeted Maximum Likelihood Estimation principle from the statistical causal inference literature. We also introduce several variance reduction techniques that lead to impressive performance gains in off-policy evaluation. We show empirically that our estimator uniformly wins over existing off-policy evaluation methods across multiple RL environments and various levels of model misspecification. Finally, we further the existing theoretical analysis of estimators for the RL off-policy estimation problem by showing their O_P(1/√n) rate of convergence and characterizing their asymptotic distribution.

机译：我们研究了加强学习（RL）中的违规政策评估（OPE）的问题，目的是估计新政策的表现给出了可能由不同的政策或政策产生的历史数据。特别是，基于来自统计因果推理文献的目标最大似然估计原理，我们在RL中介绍了一种用于OPE问题的新型稳健估计。我们还介绍了几种差异减少技术，导致违规评估令人印象深刻的性能。我们凭经验展示我们的估算器跨越多个RL环境以及各种级别的模型拼盘均匀赢得了现有的脱助政策评估方法。最后，我们进一步通过显示其O_P（1 /√N）的收敛率并表征其渐近分布，进一步了解RL脱离政策估算问题的估计的理论分析。

著录项

来源
《International Conference on Machine Learning》|2019年|722-1419p|共10页
会议地点
作者
Aurelien F. Bibaut; Ivana Malenica; Nikos Vlassis; Mark J. van der Laan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes [J] . Nathan Kallus, Masatoshi Uehara Journal of machine learning research . 2020,第a期

机译：马尔可夫决策过程有效截止政策评估的双重加固学习
2. A perspective on off-policy evaluation in reinforcement learning [J] . Li Lihong Frontiers of computer science in China . 2019,第5期

机译：强化学习中的非政策评估视角
3. A perspective on off-policy evaluation in reinforcement learning [J] . Li Lihong Frontiers of computer science . 2019,第5期

机译：加固学习中的违规评估视角
4. More Efficient Off-Policy Evaluation through Regularized Targeted Learning [C] . Aurelien F. Bibaut, Ivana Malenica, Nikos Vlassis, International Conference on Machine Learning . 2019

机译：通过正常化的目标学习更加有效的截止政策评估
5. Off-Policy Evaluation of Reinforcement Learning in Healthcare [D] . Gottesman, Omer. 2020

机译：医疗保健强度学习的违规政策评估
6. BEAN: Interpretable and Efficient Learning With Biologically-Enhanced Artificial Neuronal Assembly Regularization [O] . Yuyang Gao, Giorgio A. Ascoli, Liang Zhao 2021

机译：豆类：通过生物学增强的人工神经元组装规则化解释和高效的学习
7. Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods [O] . Deirdre Quillen, Eric Jang, Ofir Nachum, 2018

机译：视觉型机器人掌握的深度增强学习：脱助政策方法的模拟比较评价

More Efficient Off-Policy Evaluation through Regularized Targeted Learning

摘要

著录项

相似文献

相关主题

期刊订阅