首页> 外文会议>International Conference on Communication Systems and Networks >A Hidden Markov Restless Multi-armed Bandit Model for Playout Recommendation Systems
【24h】

A Hidden Markov Restless Multi-armed Bandit Model for Playout Recommendation Systems

机译:用于播放推荐系统的隐藏马尔可夫不安的多武装匪盗模型

获取原文

摘要

We consider a restless multi-armed bandit (RMAB) in which each arm can be in one of two states, say 0 or 1. Playing an arm generates a unit reward with a probability that depends on the state of the arm. The belief about the state of the arm can be calculated using a Bayesian update after every play. This RMAB has been designed for use in recommendation systems where the user's preferences depend on the history of recommendations. In this paper we analyse the RMAB by first studying single armed bandit. We show that it is Whittle-indexable and obtain a closed form expression for the Whittle index. For a RMAB to be useful in practice, we need to be able to learn the parameters of the arms. We present Thompson sampling scheme, that learns the parameters of the arms and also illustrate its performance numerically.
机译:我们考虑一个焦躁的多武装强盗(RMAB),其中每个臂可以在两个状态中的一个中,说0或1.播放手臂以取决于臂的状态的概率产生单元奖励。关于手臂状态的信念可以在每次游戏后使用贝叶斯更新来计算。此RMAB设计用于推荐系统,其中用户的偏好依赖于建议的历史。在本文中,我们通过首先研究单一武装强盗来分析RMAB。我们表明它是可索引的,并获得薄片指数的闭合形式表达。对于在实践中有用的RMAB,我们需要能够学习武器的参数。我们介绍了汤普森采样方案,用于了解武器的参数,并在数值上说明其性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号