...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization
【24h】

Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization

机译:通过方差差异最小化的方差正则化反事实风险最小化

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Off-policy learning, the task of evaluating and improving policies using historic data collected from a logging policy, is important because on-policy evaluation is usually expensive and has adverse impacts. One of the major challenge of off-policy learning is to derive counterfactual estimators that also has low variance and thus low generalization error. In this work, inspired by learning bounds for importance sampling problems, we present a new counterfactual learning principle for off-policy learning with bandit feedbacks. Our method regularizes the generalization error by minimizing the distribution divergence between the logging policy and the new policy, and removes the need for iterating through all training samples to compute sample variance regularization in prior work. With neural network policies, our end-to-end training algorithms using variational divergence minimization showed significant improvement over conventional baseline algorithms and is also consistent with our theoretical results.
机译:非策略性学习是使用从日志记录策略收集的历史数据来评估和改进策略的任务,这一点很重要,因为基于策略的评估通常很昂贵,并且会产生不利影响。非政策学习的主要挑战之一是要得出反事实估计量,该估计量的方差也很小,因此泛化误差也很小。在这项工作中,受重要性抽样问题的学习界限的启发,我们提出了一种新的反事实学习原理,用于通过强盗反馈进行非政策学习。我们的方法通过最小化日志记录策略与新策略之间的分布差异来规范化泛化误差,并且无需在所有工作样本中进行迭代以计算先前工作中的样本方差正则化。利用神经网络策略,我们使用变分散度最小化的端到端训练算法显示出比常规基线算法显着改进,并且与我们的理论结果一致。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号