首页> 外文会议>IEEE Conference on Computer Communications >Combinatorial Sleeping Bandits with Fairness Constraints
【24h】

Combinatorial Sleeping Bandits with Fairness Constraints

机译:具有公平约束的组合式睡土匪

获取原文

摘要

The multi-armed bandit (MAB) model has been widely adopted for studying many practical optimization problems (network resource allocation, ad placement, crowdsourcing, etc.) with unknown parameters. The goal of the player (i.e., the decision maker) here is to maximize the cumulative reward in the face of uncertainty. However, the basic MAB model neglects several important factors of the system in many realworld applications, where multiple arms (i.e., actions) can be simultaneously played and an arm could sometimes be “sleeping” (i.e., unavailable). Besides reward maximization, ensuring fairness is also a key design concern in practice. To that end, we propose a new Combinatorial Sleeping MAB model with Fairness constraints, called CSMAB-F, aiming to address the aforementioned crucial modeling issues. The objective is now to maximize the reward while satisfying the fairness requirement of a minimum selection fraction for each individual arm. To tackle this new problem, we extend an online learning algorithm, called Upper Confidence Bound (UCB), to deal with a critical tradeoff between exploitation and exploration and employ the virtual queue technique to properly handle the fairness constraints. By carefully integrating these two techniques, we develop a new algorithm, called Learning with Fairness Guarantee (LFG), for the CSMAB-F problem. Further, we rigorously prove that not only LFG is feasibility-optimal, but it also has a time-average regret upper bounded by N/2η + β
机译:多臂强盗(MAB)模型已被广泛用于研究许多参数未知的实际优化问题(网络资源分配,广告放置,众包等)。玩家(即决策者)的目标是在面对不确定性时最大化累积奖励。但是,在许多实际应用中,基本的MAB模型忽略了系统的几个重要因素,在这些应用中,可以同时播放多个手臂(即动作),而手臂有时会“处于睡眠状态”(即不可用)。在实践中,除了奖励最大化以外,确保公平也是设计的关键问题。为此,我们提出了一种新的具有公平性约束的组合睡眠MAB模型,称为CSMAB-F,旨在解决上述关键建模问题。现在的目标是在满足每个手臂最小选择分数的公平性要求的同时最大化回报。为了解决这个新问题,我们扩展了一种在线学习算法,称为上限可信度上限(UCB),以处理开发和勘探之间的关键权衡,并使用虚拟队列技术来正确处理公平性约束。通过仔细地集成这两种技术,我们针对CSMAB-F问题开发了一种称为“公平保证学习”(LFG)的新算法。此外,我们严格证明了LFG不仅是可行性最优的,而且还具有以N /2η+β为上限的时间平均后悔

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号