【24h】

Copeland Dueling Bandits

机译:谷轮决斗土匪

获取原文

摘要

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form O(K log T) but require restrictive assumptions, or offer bounds of the form O(K~2 log T) without requiring such assumptions. Our results offer the best of both worlds: O(K log T) bounds without restrictive assumptions.
机译:解决了一个强盗决斗问题的版本,其中可能没有Condorcet赢家。提出了两种算法,这些算法试图使对谷轮优胜者的遗憾降到最低,这与Condorcet优胜者不同,可以保证存在。第一个是Copeland Confidence Bound(CCB),设计用于少量武器,而第二个,可扩展Copeland Bandits(SCB),在解决大规模问题时效果更好。我们提供的理论结果限制了建行和渣打银行积累的遗憾,两者均大大改善了现有结果。这样的现有结果要么提供O(K log T)形式的边界,但需要限制性假设,要么提供O(K〜2 log T)形式的边界,而无需此类假设。我们的结果提供了两全其美的方法:没有限制性假设的O(K log T)界。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号