首页> 外文学位 >Online Controlled Experiment Design: Trade-off Between Statistical Uncertainty and Cumulative Reward.
【24h】

Online Controlled Experiment Design: Trade-off Between Statistical Uncertainty and Cumulative Reward.

机译:在线控制实验设计:在统计不确定性和累积奖励之间进行权衡。

获取原文
获取原文并翻译 | 示例

摘要

Online experiment is widely used in online advertising, web development to compare effects, e.g. click through rate, conversion rate, of different versions. Among all the designs, A/B testing is the most popular one. It randomly segments users into two groups with equal probability and shows them different versions. This method is easy to implement. However the shortcoming is also obvious: to measure both versions it cannot expose all users to the best version, which leads to potential loss of rewards, e.g. clicks and conversions. Though this loss is inevitable in experiment, it can be reduced somehow. Reducing the loss is essentially equivalent to maximizing cumulative reward, which is also the goal of typical multi-armed bandit problem. Thus, multi-armed bandit algorithms are proposed to reduce potential loss in experiment. Compared with A/B testing, multi-armed bandit algorithms produce more cumulative reward during experiment. However, they suffer from high statistical uncertainty: e.g. they need more users than A/B testing to reach particular statistical significance level.;To solve this problem, this paper aims at building a model to analyze two conflicting goals: reducing statistical uncertainty and maximizing cumulative reward. We develop an algorithm for online experiment to balance the trade-off between these two goals. Right now our analysis focuses on one kind of online experiment: batch updating binomial experiment. We first discuss several statistical uncertainty criterion and propose corresponding algorithms to optimize these criterion for experiment. Then we extend some multi-armed bandit algorithms to maximizing cumulative reward for batch updating problem. Besides that, we propose an new algorithm: sequential two stages (STS) to solve this problem. After that, an improved performance evaluation method, which integrates statistical uncertainty with cumulative reward, is put forwarded. Instead of simply combining two objective functions, this new measure, virtual future measure (VFM) establishes connection between statistical uncertainty and cumulative reward directly through virtual future reward. Compared with other method, our proposed algorithm STS is adaptable to optimize VFM.
机译:在线实验广泛用于在线广告,网站开发中,以比较效果,例如不同版本的点击率,转化率。在所有设计中,A / B测试是最受欢迎的一种。它以相等的概率将用户随机分为两组,并向他们显示不同的版本。此方法易于实现。但是缺点也很明显:要同时测量这两个版本,就不能使所有用户都接触到最佳版本,从而导致潜在的奖励损失,例如点击和转化。尽管这种损失在实验中是不可避免的,但可以通过某种方式减少。减少损失从本质上讲等于最大化累积报酬,这也是典型的多武装匪徒问题的目标。因此,提出了多臂强盗算法来减少实验中的潜在损失。与A / B测试相比,多臂强盗算法在实验过程中产生更多的累积奖励。但是,他们遭受着很高的统计不确定性:为了达到特定的统计显着性水平,他们需要比A / B测试更多的用户。为了解决此问题,本文旨在建立一个模型来分析两个相互冲突的目标:减少统计不确定性和最大化累积奖励。我们开发了一种在线实验算法,以平衡这两个目标之间的平衡。现在,我们的分析集中于一种在线实验:批处理更新二项式实验。我们首先讨论几种统计不确定性准则,并提出相应的算法以优化这些准则以进行实验。然后,我们扩展了一些多武装的强盗算法,以使批量更新问题的累积奖励最大化。除此之外,我们提出了一种新算法:顺序两个阶段(STS)来解决此问题。在此基础上,提出了一种改进的绩效评估方法,该方法将统计不确定性与累积奖励相结合。虚拟未来测度(VFM)并非简单地组合两个目标函数,而是通过虚拟未来测验直接在统计不确定性和累积收益之间建立联系。与其他方法相比,我们提出的算法STS适用于优化VFM。

著录项

  • 作者

    Dai, Liang.;

  • 作者单位

    University of California, Santa Cruz.;

  • 授予单位 University of California, Santa Cruz.;
  • 学科 Information science.;Computer science.
  • 学位 M.S.
  • 年度 2014
  • 页码 67 p.
  • 总页数 67
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号