首页> 外文会议>2017 IEEE 27th International Workshop on Machine Learning for Signal Processing >Multi-Objective contextual bandits with a dominant objective
【24h】

Multi-Objective contextual bandits with a dominant objective

机译:具有主要目标的多目标语境强盗

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In this paper, we propose a new contextual bandit problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective bandit problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives. The goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its reward in the dominant objective. In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. For this problem, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and prove that it achieves sublinear regret with respect to the optimal context dependent policy. Then, we compare the performance of the proposed algorithm with other state-of-the-art bandit algorithms. The proposed contextual bandit model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems.
机译:在本文中,我们提出了一个新的具有两个目标的情境强盗问题,其中一个目标主导另一个目标。与学习者为其选择的每个手臂获得随机标量奖励的单目标强盗问题不同,在提出的问题中,学习者获得随机奖励矢量,其中奖励矢量的每个组成部分对应于一个目标。学习者的目标是在非主导目标中最大化其总奖励,同时确保在主导目标中最大化其奖励。在这种情况下,给定上下文的最优分支是使非主导目标中的预期回报最大化的所有分支中,在主导目标中的预期奖励最大化的所有分支。针对这一问题,我们提出了多目标上下文多武装强盗算法(MOC-MAB),并证明了它在最优上下文依赖策略方面实现了亚线性遗憾。然后,我们将所提算法的性能与其他最新的强盗算法进行比较。所提出的上下文强盗模型和算法具有广泛的实际应用,涉及从无线通信到医学诊断和推荐系统的多个甚至可能相互冲突的目标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号