首页> 外文期刊>Signal Processing, IEEE Transactions on >Distributed Online Learning via Cooperative Contextual Bandits
【24h】

Distributed Online Learning via Cooperative Contextual Bandits

机译:通过协作上下文强盗进行分布式在线学习

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a novel framework for decentralized, online learning by many learners. At each moment of time, an instance characterized by a certain context may arrive to each learner; based on the context, the learner can select one of its own actions (which gives a reward and provides information) or request assistance from another learner. In the latter case, the requester pays a cost and receives the reward but the provider learns the information. In our framework, learners are modeled as cooperative contextual bandits. Each learner seeks to maximize the expected reward from its arrivals, which involves trading off the reward received from its own actions, the information learned from its own actions, the reward received from the actions requested of others and the cost paid for these actions—taking into account what it has learned about the value of assistance from each other learner. We develop distributed online learning algorithms and provide analytic bounds to compare the efficiency of these with algorithms with the complete knowledge (oracle) benchmark (in which the expected reward of every action in every context is known by every learner). Our estimates show that regret—the loss incurred by the algorithm—is sublinear in time. Our theoretical framework can be used in many practical applications including Big Data mining, event detection in surveillance sensor networks and distributed online recommendation systems.
机译:在本文中,我们为许多学习者提出了一种去中心化在线学习的新颖框架。在每时每刻,一个具有特定上下文特征的实例可能会到达每个学习者。根据上下文,学习者可以选择自己的动作之一(给予奖励并提供信息),也可以请求其他学习者的帮助。在后一种情况下,请求者支付费用并获得奖励,但提供者获悉信息。在我们的框架中,学习者被建模为协作情境强盗。每个学习者都试图从到达的学习中获得最大的期望报酬,这包括权衡从自己的行为获得的报酬,从自己的行为中学到的信息,从其他人请求的行为获得的报酬以及为这些行为支付的费用。考虑到它从对方学习者那里获得的帮助价值。我们开发了分布式在线学习算法,并提供了分析范围,以将它们的效率与具有完整知识(oracle)基准的算法进行比较(在该基准中,每个学习者都知道在每种情况下每个动作的预期收益)。我们的估计表明,遗憾(即算法造成的损失)在时间上是次线性的。我们的理论框架可用于许多实际应用中,包括大数据挖掘,监视传感器网络中的事件检测以及分布式在线推荐系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号