首页> 外文会议>International conference on web and internet economics >Time-Decaying Bandits for Non-stationary Systems
【24h】

Time-Decaying Bandits for Non-stationary Systems

机译:非固定系统的时效土匪

获取原文

摘要

Contents displayed on web portals (e.g.,) are usually adaptively selected from a dynamic set of candidate items, and the attractiveness of each item decays over time. The goal of those websites is to maximize the engagement of users (usually measured by their clicks) on the selected items. We formulate this kind of applications as a new variant of bandit problems where new arms are dynamically added into the candidate set and the expected reward of each arm decays as the round proceeds. For this new problem, a direct application of the algorithms designed for stochastic MAB (e.g., UCB) will lead to over-estimation of the rewards of old arms, and thus cause a misidentification of the optimal arm. To tackle this challenge, we propose a new algorithm that can adaptively estimate the temporal dynamics in the rewards of the arms, and effectively identify the best arm at a given time point on this basis. When the temporal dynamics are represented by a set of features, the proposed algorithm is able to enjoy a sub-linear regret. Our experiments verify the effectiveness of the proposed algorithm.
机译:通常从动态的候选项目集中自适应地选择显示在门户网站(例如)上的内容,并且每个项目的吸引力随着时间而衰减。这些网站的目标是最大程度地提高用户对所选项目的参与度(通常通过其点击来衡量)。我们将这种应用程序描述为强盗问题的新变体,其中新的组被动态添加到候选集中,并且随着回合的进行,每个组的预期收益都会下降。对于这个新问题,为随机MAB设计的算法(例如UCB)的直接应用将导致对旧武器奖励的过高估计,从而导致对最佳武器的错误识别。为了解决这一挑战,我们提出了一种新算法,该算法可以自适应地估计武器奖励中的时间动态,并在此基础上有效地确定给定时间点上的最佳武器。当时间动态由一组特征表示时,所提出的算法能够享受次线性遗憾。我们的实验验证了所提算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号