Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

机译：决斗匪徒：超越Condorcet获奖者到一般锦标赛解决方案

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recent work on deriving O(log T) anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show O(log T) anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest.

机译：最近的工作在推导O（log t）随机Dueling Bairit问题的时间后悔界限已经考虑过，并且最近并不总是存在的，并且最近，由宾夕法尔集合定义的赢家始终存在。在这项工作中，我们考虑了社会选择理论中锦标赛解决方案所定义的广泛概念，其中包括赛堡设定为特例，还包括其他几个胜利者的胜利者，如顶级周期，未覆盖的集合和银行集合，与宾夕法尼亚州一样，这将永远存在。我们为这种普通锦标赛解决方案开发了一个UCB样式的Dueling Bandit算法，并随时为它们遗憾地显示O（log t）。实验证实了我们的算法相对于目标获胜的目标实现低遗憾的能力。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2016年|p. 694-1424|共9页
会议地点
作者
Siddartha Ramamohan; Arun Rajkumar; Shivani Agarwal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Approval Voting, Borda Winners, and Condorcet Winners: Evidence from Seven Elections [J] . Michel Regenwetter, Bernard Grofman Management science: Journal of the Institute of Management Sciences . 1998,第4期

机译：批准投票，Borda获奖者和Condorcet获奖者：来自七次选举的证据
2. Condorcet-Consistent and Approximately Strategyproof Tournament Rules [J] . Jon Schneider, Ariel Schvartzman, S. Matthew Weinberg LIPIcs : Leibniz International Proceedings in Informatics . 2017,第1期

机译：慰问一致和近似策略证明的比赛规则
3. TOURNAMENT GAMES AND CONDORCET VOTING [J] . Fisher DC., Ryan J. Linear Algebra and its Applications . 1995,第0期

机译：赛车游戏和投票表决
4. Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions [C] . Siddartha Ramamohan, Arun Rajkumar, Shivani Agarwal Annual conference on Neural Information Processing Systems . 2016

机译：决斗土匪：超越一般锦标赛解决方案的Condorcet赢家
5. Adaptive Preference Learning with Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration [D] . Chen, Bangrui. 2017

机译：带有土匪反馈的自适应偏好学习：信息过滤，决斗土匪和激励探索
6. From the Cover: Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research [O] . William H. Press 2009

机译：从封面开始：Bandit解决方案为随机临床试验和比较有效性研究提供统一的道德模型
7. Reducing Dueling Bandits to Cardinal Bandits [O] . Ailon, Nir, Joachims, Thorsten, Karnin, Zohar 2014

机译：减少决斗强盗到红衣主教强盗

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅