首页> 外文会议>International Conference on Autonomous Agents and Multiagent Systems >Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
【24h】

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

机译:用模型引导:禁止禁止评估的置信区间

获取原文

摘要

For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower-bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Since direct use of a model may introduce bias, we derive a theoretical upper bound on model bias for when the model transition function is estimated with i.i.d. trajectories. This bound broadens our understanding of the conditions under which model-based methods have high bias. Finally, we empirically evaluate our proposed methods and analyze the settings in which different bootstrapping off-policy confidence interval methods succeed and fail.
机译:对于一个自主代理商,执行糟糕的政策可能是昂贵的甚至危险的。对于这样的药剂,希望在不执行所述策略的情况下确定任何给定政策的性能的置信区间下限。目前使用重要性采样的精确高置信抵消评估的目前方法需要大量的数据来实现紧张的下限。基于模型的方法仅在离散状态空间中解决问题。由于我们对许多域来说,由于许多域来说,我们对许多域来说都是棘手的,因此对于更多数据有效的近似范围,我们将严格保证安全。在此上下文中,我们提出了两次启动禁止禁止策略评估方法,该评估方法使用学习的MDP转换模型,以估计策略性能的较低置信度界限,在连续和离散状态空间中具有有限的数据。由于直接使用模型可能引入偏差,因此当使用i.i.d估计模型过渡功能时,我们导出了模型偏置的理论上界限。轨迹。这一界限扩大了对基于模型的方法具有高偏差的条件的理解。最后,我们凭经验评估了我们所提出的方法并分析不同自动启动禁止策略置信区间方法的设置成功和失败。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号