首页> 外文会议>Annual Allerton Conference on Communication, Control, and Computing >Collaborative learning of stochastic bandits over a social network
【24h】

Collaborative learning of stochastic bandits over a social network

机译:通过社交网络协作学习随机土匪

获取原文

摘要

We consider a collaborative online learning paradigm, wherein a group of agents connected through a social network are engaged in playing a stochastic multi-armed bandit game. Each time an agent takes an action, the corresponding reward is instantaneously observed by the agent, as well as its neighbours in the social network. We perform a regret analysis of various policies in this collaborative learning setting. A key finding of this paper is that natural extensions of widely-studied single agent learning policies to the network setting need not perform well in terms of regret. In particular, we identify a class of non-altruistic and individually consistent policies, and argue by deriving regret lower bounds that they are liable to suffer a large regret in the networked setting. We also show that the learning performance can be substantially improved if the agents exploit the structure of the network, and develop a simple learning algorithm based on dominating sets of the network. Specifically, we first consider a star network, which is a common motif in hierarchical social networks, and show analytically that the hub agent can be used as an information sink to expedite learning and improve the overall regret. We also derive network-wide regret bounds for the algorithm applied to general networks. We conduct numerical experiments on a variety of networks to corroborate our analytical results.
机译:我们考虑了一种协作式在线学习范例,其中通过社交网络连接的一组特工参与了随机多臂强盗游戏。每当代理人采取行动时,代理人及其在社交网络中的邻居都会立即观察到相应的奖励。在这种协作式学习环境中,我们对各种政策进行了遗憾的分析。本文的一个关键发现是,就遗憾而论,广为研究的单一代理学习策略向网络设置的自然扩展不需要表现良好。特别是,我们确定了一类非利他性和个体一致性的策略,并通过推导遗憾的下界来争论它们在网络环境中容易遭受很大的遗憾。我们还表明,如果代理利用网络的结构,并基于网络的主导集开发一种简单的学习算法,则学习性能可以得到显着提高。具体来说,我们首先考虑星型网络,它是分层社交网络中的常见主题,并且通过分析表明,中心代理可以用作信息接收器,以加快学习速度并改善总体遗憾。我们还推导了适用于一般网络的算法在整个网络范围内的后悔界限。我们在各种网络上进行数值实验,以证实我们的分析结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号