Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

机译：用模型引导：禁止禁止评估的置信区间

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower-bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Since direct use of a model may introduce bias, we derive a theoretical upper bound on model bias for when the model transition function is estimated with i.i.d. trajectories. This bound broadens our understanding of the conditions under which model-based methods have high bias. Finally, we empirically evaluate our proposed methods and analyze the settings in which different bootstrapping off-policy confidence interval methods succeed and fail.

机译：对于一个自主代理商，执行糟糕的政策可能是昂贵的甚至危险的。对于这样的药剂，希望在不执行所述策略的情况下确定任何给定政策的性能的置信区间下限。目前使用重要性采样的精确高置信抵消评估的目前方法需要大量的数据来实现紧张的下限。基于模型的方法仅在离散状态空间中解决问题。由于我们对许多域来说，由于许多域来说，我们对许多域来说都是棘手的，因此对于更多数据有效的近似范围，我们将严格保证安全。在此上下文中，我们提出了两次启动禁止禁止策略评估方法，该评估方法使用学习的MDP转换模型，以估计策略性能的较低置信度界限，在连续和离散状态空间中具有有限的数据。由于直接使用模型可能引入偏差，因此当使用i.i.d估计模型过渡功能时，我们导出了模型偏置的理论上界限。轨迹。这一界限扩大了对基于模型的方法具有高偏差的条件的理解。最后，我们凭经验评估了我们所提出的方法并分析不同自动启动禁止策略置信区间方法的设置成功和失败。

著录项

来源
《International Conference on Autonomous Agents and Multiagent Systems》|2018年|641p|共9页
会议地点
作者
Josiah P. Hanna; Peter Stone; Scott Niekum;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Reinforcement learning; Off-policy evaluation; Bootstrapping;

机译：加强学习;脱助政策评估;自举;
入库时间 2022-08-20 20:12:53

相似文献

外文文献
中文文献
专利

1. Bootstrapping Confidence Intervals for Fit Indexes in Structural Equation Modeling [J] . Zhang Xijuan, Savalei Victoria Structural equation modeling . 2016,第3a4期

机译：结构方程模型中拟合索引的自举置信区间
2. Confidence intervals for DEA models efficiency scores by bootstrapping method [J] . Ebadi S., Jahanshahloo G.R. Applied and Computational Mathematics ean international journal . 2013,第1期

机译：DEA的置信区间通过自举方法建模效率得分
3. Bootstrapping for confidence interval estimation and hypothesis testing for parameters of system dynamics models [J] . Gokhan Dogan System dynamics review . 2007,第4期

机译：引导进行置信区间估计和系统动力学模型参数的假设检验
4. Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation [C] . Josiah P. Hanna, Peter Stone, Scott Niekum International Conference on Autonomous Agents and Multiagent Systems . 2018

机译：用模型引导：禁止禁止评估的置信区间
5. Bootstrapping with small samples in structural equation modeling: Goodness of fit and confidence intervals [D] . Krebsbach, Craig Michael. 2013

机译：在结构方程模型中以小样本进行自举：拟合优度和置信区间
6. Improved confidence intervals in quantitative trait loci mapping by permutation bootstrapping. [O] . Jörn Bennewitz, Norbert Reinsch, Ernst Kalm 2002

机译：通过置换自举提高了定量性状基因座映射的置信区间。
7. Bootstrapping to Obtain Confidence Intervals for Parameters in Ordinary Differential Equations - Infectious Disease Models [O] . Contreras Martha, Zadoks Ruth, Allore Heather G., 2000

机译：自举获得常微分方程-传染病模型中参数的置信区间
8. Prediction-Interval Procedures and (Fixed-Effects) Confidence-Interval Procedures for Mixed Linear Models [R] . Jeske, D. R., Harville, D. A. 1988

机译：混合线性模型的预测 - 区间程序和（固定效应）置信区间程序

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

摘要

著录项

相似文献

相关主题

期刊订阅