Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization

Hang Wu; May Wang

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization

【24h】

Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization

机译：通过方差差异最小化的方差正则化反事实风险最小化

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Off-policy learning, the task of evaluating and improving policies using historic data collected from a logging policy, is important because on-policy evaluation is usually expensive and has adverse impacts. One of the major challenge of off-policy learning is to derive counterfactual estimators that also has low variance and thus low generalization error. In this work, inspired by learning bounds for importance sampling problems, we present a new counterfactual learning principle for off-policy learning with bandit feedbacks. Our method regularizes the generalization error by minimizing the distribution divergence between the logging policy and the new policy, and removes the need for iterating through all training samples to compute sample variance regularization in prior work. With neural network policies, our end-to-end training algorithms using variational divergence minimization showed significant improvement over conventional baseline algorithms and is also consistent with our theoretical results.

机译：非策略性学习是使用从日志记录策略收集的历史数据来评估和改进策略的任务，这一点很重要，因为基于策略的评估通常很昂贵，并且会产生不利影响。非政策学习的主要挑战之一是要得出反事实估计量，该估计量的方差也很小，因此泛化误差也很小。在这项工作中，受重要性抽样问题的学习界限的启发，我们提出了一种新的反事实学习原理，用于通过强盗反馈进行非政策学习。我们的方法通过最小化日志记录策略与新策略之间的分布差异来规范化泛化误差，并且无需在所有工作样本中进行迭代以计算先前工作中的样本方差正则化。利用神经网络策略，我们使用变分散度最小化的端到端训练算法显示出比常规基线算法显着改进，并且与我们的理论结果一致。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第4期|共10页
作者
Hang Wu; May Wang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization [J] . He Xiaofei, Ji Ming, Zhang Chiyuan, Pattern Analysis and Machine Intelligence, IEEE Transactions on . 2011,第10期

机译：使用拉普拉斯正则化的特征选择方差最小化准则
2. Supervised nonnegative matrix factorization via minimization of regularized Moreau-envelope of divergence function with application to music transcription [J] . Yukawa Masahiro, Kagami Hideaki Journal of the Franklin Institute . 2018,第4期

机译：通过最小化正则化的散度函数的Moreau包络的最小化进行非负矩阵分解，并将其应用于音乐转录
3. Renyi divergence minimization based co-regularized multiview clustering [J] . Joshi Shalmali, Ghosh Joydeep, Reid Mark, Machine Learning . 2016,第2a3期

机译：基于人散度最小化的正则化多视图聚类
4. Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games Using Baselines [C] . Martin Schmid, Neil Burch, Marc Lanctot, AAAI Conference on Artificial Intelligence . 2019

机译：蒙特卡罗反事实遗憾最小化（VR-MCCFR）使用基线进行广泛的游戏的差异减少
5. Using counterfactual regret minimization to create a competitive multiplayer poker agent [D] . Abou Risk, Nicholas 2009

机译：使用反事实后悔最小化来创建有竞争力的多人扑克经纪人
6. A FAST MAJORIZE MINIMIZE ALGORITHM FOR HIGHER DEGREE TOTAL VARIATION REGULARIZATION [O] . Yue Hu, Sathish Ramani, Mathews Jacob -1

机译：一种快速maJORIZE算法和最小化高次全变差正
7. A variance minimization criterion to feature selection using laplacian regularization [O] . Xiaofei He, Senior Member, Ming Ji, 2012

机译：使用拉普拉斯正则化进行特征选择的方差最小化准则

Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization

摘要

著录项

相似文献

相关主题

期刊订阅