首页> 外文会议>International conference on communication systems and networks >A Hidden Markov Restless Multi-armed Bandit Model for Playout Recommendation Systems
【24h】

A Hidden Markov Restless Multi-armed Bandit Model for Playout Recommendation Systems

机译:播出推荐系统的隐马尔可夫躁动不安多臂土匪模型

获取原文

摘要

We consider a restless multi-armed bandit (RMAB) in which each arm can be in one of two states, say 0 or 1. Playing an arm generates a unit reward with a probability that depends on the state of the arm. The belief about the state of the arm can be calculated using a Bayesian update after every play. This RMAB has been designed for use in recommendation systems where the user's preferences depend on the history of recommendations. In this paper we analyse the RMAB by first studying single armed bandit. We show that it is Whittle-indexable and obtain a closed form expression for the Whittle index. For a RMAB to be useful in practice, we need to be able to learn the parameters of the arms. We present Thompson sampling scheme, that learns the parameters of the arms and also illustrate its performance numerically.
机译:我们考虑一个不安定的多臂匪徒(RMAB),其中每个手臂可以处于两种状态之一,例如0或1。演奏手臂会产生单位奖励,其概率取决于手臂的状态。关于手臂状态的信念可以在每次比赛之后使用贝叶斯更新来计算。此RMAB设计用于推荐系统,其中用户的偏好取决于推荐的历史记录。在本文中,我们首先研究单武装匪徒来分析RMAB。我们证明它是Whittle可索引的,并为Whittle索引获得一个封闭形式的表达式。为了使RMAB在实践中有用,我们需要能够了解机械臂的参数。我们提出了汤普森采样方案,该方案可以学习武器的参数并通过数值说明其性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号