首页> 外文会议>Annual conference on Neural Information Processing Systems >Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions
【24h】

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

机译:决斗匪徒:超越Condorcet获奖者到一般锦标赛解决方案

获取原文

摘要

Recent work on deriving O(log T) anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show O(log T) anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest.
机译:最近的工作在推导O(log t)随机Dueling Bairit问题的时间后悔界限已经考虑过,并且最近并不总是存在的,并且最近,由宾夕法尔集合定义的赢家始终存在。在这项工作中,我们考虑了社会选择理论中锦标赛解决方案所定义的广泛概念,其中包括赛堡设定为特例,还包括其他几个胜利者的胜利者,如顶级周期,未覆盖的集合和银行集合,与宾夕法尼亚州一样,这将永远存在。我们为这种普通锦标赛解决方案开发了一个UCB样式的Dueling Bandit算法,并随时为它们遗憾地显示O(log t)。实验证实了我们的算法相对于目标获胜的目标实现低遗憾的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号