PAC Battling Bandits in the Plackett-Luce Model

Aadirupa Saha; Aditya Gopalan

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >PAC Battling Bandits in the Plackett-Luce Model

【24h】

PAC Battling Bandits in the Plackett-Luce Model

机译：PAC在Plackett-Luce模型中与土匪作战

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce the probably approximately correct (PAC) emph{Battling-Bandit} problem with the Plackett-Luce (PL) subset choice model–an online learning framework where at each trial the learner chooses a subset of $k$ arms from a fixed set of $n$ arms, and subsequently observes a stochastic feedback indicating preference information of the items in the chosen subset, e.g., the most preferred item or ranking of the top $m$ most preferred items etc. The objective is to identify a near-best item in the underlying PL model with high confidence. This generalizes the well-studied PAC emph{Dueling-Bandit} problem over $n$ arms, which aims to recover the emph{best-arm} from pairwise preference information, and is known to require $O(rac{n}{epsilon^2} ln rac{1}{delta})$ sample complexity. We study the sample complexity of this problem under various feedback models: (1) Winner of the subset (WI), and (2) Ranking of top-$m$ items (TR) for $2le m le k$. We show, surprisingly, that with winner information (WI) feedback over subsets of size $2 leq k leq n$, the best achievable sample complexity is still $Oleft( rac{n}{epsilon^2} ln rac{1}{delta}ight)$, independent of $k$, and the same as that in the Dueling Bandit setting ($k=2$). For the more general top-$m$ ranking (TR) feedback model, we show a significantly smaller lower bound on sample complexity of $Omegaigg( rac{n}{mepsilon^2} ln rac{1}{delta}igg)$, which suggests a multiplicative reduction by a factor ${m}$ owing to the additional information revealed from preferences among $m$ items instead of just $1$. We also propose two algorithms for the PAC problem with the TR feedback model with optimal (upto logarithmic factors) sample complexity guarantees, establishing the increase in statistical efficiency from exploiting rank-ordered feedback.

机译：我们使用Plackett-Luce（PL）子集选择模型介绍一个大概正确的（PAC） emph {Battling-Bandit}问题-一种在线学习框架，在该在线学习框架中，学习者从固定样本中选择$ k $子集的子集一组$ n $的武器，随后观察到随机反馈，该反馈指示所选子集中的项目的偏好信息，例如，最喜欢的项目或排名靠前的$ m $最喜欢的项目的排名等。目标是确定附近-高可信度的基础PL模型中的最佳项目。这将广泛研究经过深入研究的$ n $臂上的PAC emph {Dueling-Bandit}问题，其目的是从成对偏好信息中恢复 emph {best-arm}，并且已知需要$ O（ frac {n } { epsilon ^ 2} ln frac {1} { delta}）$样本复杂度。我们研究了在各种反馈模型下该问题的样本复杂性：（1）子集的获胜者（WI），以及（2）$ 2 le m le k $的$ m $项（TR）的排名。令人惊讶地，我们显示，在大小为$ 2 leq k leq n $的子集上的获胜者信息（WI）反馈下，可实现的最佳样本复杂度仍然为$ O left（ frac {n} { epsilon ^ 2} 在 frac {1} { delta} right）$中，与$ k $无关，并且与Dueling Bandit设置中的值相同（$ k = 2 $）。对于更一般的$ m $排名（TR）反馈模型，我们显示了$ Omega bigg（ frac {n} {m epsilon ^ 2} ln frac { 1} { delta} bigg）$，这表明可乘数减少了$ {m} $，这是因为$ m $项中的偏好显示了更多信息，而不仅仅是$ 1 $。我们还针对具有最优（最多对数因子）样本复杂度保证的TR反馈模型，针对PAC问题提出了两种算法，通过利用排序反馈来提高统计效率。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2019年第2期|共38页
作者
Aadirupa Saha; Aditya Gopalan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. PAC Battling Bandits in the Plackett-Luce Model [J] . Aadirupa Saha, Aditya Gopalan JMLR: Workshop and Conference Proceedings . 2019,第12期

机译：PAC在Plackett-Luce模型中与土匪作战
2. Dyad ranking using Plackett-Luce models based on joint feature representations [J] . Schaefer Dirk, Huellermeier Eyke Machine Learning . 2018,第5期

机译：使用基于联合特征表示的Plackett-Luce模型进行Dyad排序
3. Bayesian Plackett-Luce Mixture Models for Partially Ranked Data [J] . Mollica Cristina, Tardella Luca Psychometrika . 2017,第2期

机译：Bayesian Plackett-Luce混合模型用于部分排名数据
4. From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model [C] . Aadirupa Saha, Aditya Gopalan International Conference on Machine Learning . 2021

机译：从PAC到实例 - 在Plackett-Luce模型中的最佳样本复杂性
5. Generalized method of moments algorithm for learning mixtures of Plackett-Luce models. [D] . Piech, Peter D. 2016

机译：学习Plackett-Luce模型混合的矩量算法的通用方法。
6. Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks [O] . Tristan Gray-Davies, Chris C. Holmes, François Caron -1

机译：通过Plackett-Luce模型对条件等级进行可扩展的贝叶斯非参数回归
7. Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach [O] . Szörényi Balázs, Busa-Fekete Róbert, Paul Adil, 2015

机译：Plackett-Luce在线排名征求：决斗土匪方法

PAC Battling Bandits in the Plackett-Luce Model

摘要

著录项

相似文献

相关主题

期刊订阅