Copeland Dueling Bandits

机译：谷轮决斗土匪

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form O(K log T) but require restrictive assumptions, or offer bounds of the form O(K~2 log T) without requiring such assumptions. Our results offer the best of both worlds: O(K log T) bounds without restrictive assumptions.

机译：解决了一个强盗决斗问题的版本，其中可能没有Condorcet赢家。提出了两种算法，这些算法试图使对谷轮优胜者的遗憾降到最低，这与Condorcet优胜者不同，可以保证存在。第一个是Copeland Confidence Bound（CCB），设计用于少量武器，而第二个，可扩展Copeland Bandits（SCB），在解决大规模问题时效果更好。我们提供的理论结果限制了建行和渣打银行积累的遗憾，两者均大大改善了现有结果。这样的现有结果要么提供O（K log T）形式的边界，但需要限制性假设，要么提供O（K〜2 log T）形式的边界，而无需此类假设。我们的结果提供了两全其美的方法：没有限制性假设的O（K log T）界。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2015年|307-315|共9页
会议地点
作者
Masrour Zoghi; Zohar Karnin; Shimon Whiteson; Maarten de Rijke;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Preference-based Online Learning with Dueling Bandits: A Survey [J] . Viktor Bengs, Róbert Busa-Fekete, Adil El Mesaoudi-Paul, Journal of machine learning research . 2021,第a期

机译：基于偏好的在线学习与决斗匪徒：调查
2. Zeroth Order Non-convex optimization with Dueling-Choice Bandits [J] . Yichong Xu, Aparna Joshi, Aarti Singh, JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：Zeroth命令与Dueling-Choice Bartits的非凸优化
3. Efficient Mechanisms for Peer Grading and Dueling Bandits [J] . Chuang-Chieh Lin, Chi-Jen Lu JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：对等分级和决斗匪徒的有效机制
4. Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm [C] . Junpei Komiyama, Junya Honda, Hiroshi Nakagawa International Conference on Machine Learning . 2016

机译：COPELAND DEULING BITTIT问题：遗憾较低，最佳算法和计算高效算法
5. Adaptive Preference Learning with Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration [D] . Chen, Bangrui. 2017

机译：带有土匪反馈的自适应偏好学习：信息过滤，决斗土匪和激励探索
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. Reducing Dueling Bandits to Cardinal Bandits [O] . Ailon, Nir, Joachims, Thorsten, Karnin, Zohar 2014

机译：减少决斗强盗到红衣主教强盗

Copeland Dueling Bandits

摘要

著录项

相似文献

相关主题

期刊订阅