...
首页> 外文期刊>Performance evaluation review >Federated Bandit: A Gossiping Approach
【24h】

Federated Bandit: A Gossiping Approach

机译:联邦强盗:一个闲聊的方法

获取原文
获取原文并翻译 | 示例

摘要

We study Federated Bandit, a decentralized Multi-Armed Bandit (MAB) problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm Gossip_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that Gossip_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(max{poly(N, M) log T, poly(N, M) log_(λ_2~(-1)) N}) for all N agents, where λ_2 ∈ (0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of G. We then propose Fed_UCB, a differentially private version of Gossip_UCB, in which the agents preserve e-differential privacy of their local data while achieving O(max{ε/poly(N, M) log~(2.5)T, poly(N, M)(log_(λ_2~(-1)) N + logT)}) regret.
机译:我们研究联合强盗,一个分散的多武装强盗(MAB)问题,一组N代理商,谁可以仅通过连接图G的邻居传达它们的本地数据。每个代理在选择臂上的决定序列M候选人,但它们只能访问当地和潜在的偏见反馈/对所采取的每个行动的真正奖励的评估。仅在融合到无后常策略的同时,仅在本地将引导代理能够进行次优,需要集合分布式数据。通过联邦学习的提议,我们的目标是解决代理商永远不会与中央实体分享当地观察的解决方案,并将被允许仅与邻居分享他/她自己信息的私人副本。我们首先提出了一种分散的强盗算法Gossip_ucb,其是经典八卦算法和庆祝的上置信度(UCB)强盗算法的变体的耦合。我们表明Gossip_ucb成功地将本地强盗学习进入了全局漫步过程,以共享连接代理之间的信息,并以O(MAX {poly(n,m)log t,poly(n,m)log_(对于所有N代理的λ_2〜(-1))n},其中λ_2∈(0,1)是预期的八卦矩阵的第二大特征值,这是G的函数。然后,我们提出了Fed_ucb,差异私有版本Gossip_ucb,其中代理在实现o(max {ε/ poly(n,m)log〜(2.5)t,poly(n,m)(log_(λ_2〜( - 1))n + logt)})后悔。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号