Distributed Online Learning via Cooperative Contextual Bandits

Tekin Cem; van der Schaar Mihaela

首页> 外文期刊>Signal Processing, IEEE Transactions on >Distributed Online Learning via Cooperative Contextual Bandits

【24h】

Distributed Online Learning via Cooperative Contextual Bandits

机译：通过协作上下文强盗进行分布式在线学习

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we propose a novel framework for decentralized, online learning by many learners. At each moment of time, an instance characterized by a certain context may arrive to each learner; based on the context, the learner can select one of its own actions (which gives a reward and provides information) or request assistance from another learner. In the latter case, the requester pays a cost and receives the reward but the provider learns the information. In our framework, learners are modeled as cooperative contextual bandits. Each learner seeks to maximize the expected reward from its arrivals, which involves trading off the reward received from its own actions, the information learned from its own actions, the reward received from the actions requested of others and the cost paid for these actions—taking into account what it has learned about the value of assistance from each other learner. We develop distributed online learning algorithms and provide analytic bounds to compare the efficiency of these with algorithms with the complete knowledge (oracle) benchmark (in which the expected reward of every action in every context is known by every learner). Our estimates show that regret—the loss incurred by the algorithm—is sublinear in time. Our theoretical framework can be used in many practical applications including Big Data mining, event detection in surveillance sensor networks and distributed online recommendation systems.

机译：在本文中，我们为许多学习者提出了一种去中心化在线学习的新颖框架。在每时每刻，一个具有特定上下文特征的实例可能会到达每个学习者。根据上下文，学习者可以选择自己的动作之一（给予奖励并提供信息），也可以请求其他学习者的帮助。在后一种情况下，请求者支付费用并获得奖励，但提供者获悉信息。在我们的框架中，学习者被建模为协作情境强盗。每个学习者都试图从到达的学习中获得最大的期望报酬，这包括权衡从自己的行为获得的报酬，从自己的行为中学到的信息，从其他人请求的行为获得的报酬以及为这些行为支付的费用。考虑到它从对方学习者那里获得的帮助价值。我们开发了分布式在线学习算法，并提供了分析范围，以将它们的效率与具有完整知识（oracle）基准的算法进行比较（在该基准中，每个学习者都知道在每种情况下每个动作的预期收益）。我们的估计表明，遗憾（即算法造成的损失）在时间上是次线性的。我们的理论框架可用于许多实际应用中，包括大数据挖掘，监视传感器网络中的事件检测以及分布式在线推荐系统。

著录项

来源
《Signal Processing, IEEE Transactions on》 |2015年第14期|3700-3714|共15页
作者
Tekin Cem; van der Schaar Mihaela;
展开▼
作者单位

Department of Electrical Engineering, UCLA,;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Contextual bandits; cooperative learning; distributed learning; multi-user bandits; multi-user learning; online learning;

机译：情境强盗;合作学习;分布式学习;多用户强盗;多用户学习;在线学习;

相似文献

外文文献
中文文献
专利

1. A Privacy-Preserving Distributed Contextual Federated Online Learning Framework with Big Data Support in Social Recommender Systems [J] . Zhou Pan, Wang Kehao, Guo Linke, IEEE Transactions on Knowledge and Data Engineering . 2021,第3期

机译：一个隐私保留的分布式上下文联合联盟在线学习框架，具有在社交推荐系统中的大数据支持
2. Relay Selection in Underwater Acoustic Cooperative Networks: A Contextual Bandit Approach [J] . Xinbin Li, Jiajia Liu, Lei Yan, IEEE communications letters . 2017,第2期

机译：水下声合作网中的中继选择：上下文强盗方法
3. Online Residential Demand Response via Contextual Multi-Armed Bandits [J] . Chen Xin, Nie Yutong, Li Na IEEE Control Systems Letters . 2021,第2期

机译：通过上下文多武装匪徒在线住宅需求响应
4. Corrupted Contextual Bandits: Online Learning with Corrupted Context [C] . Djallel Bouneffouf IEEE International Conference on Acoustics, Speech and Signal Processing . 2021

机译：损坏的上下文匪徒：与上下文损坏的在线学习
5. Efficient Online Learning with Bandit Feedback [D] . Liu, Fang. 2020

机译：高效在线学习与强盗反馈
6. The anatomy of a distributed predictive modeling framework: online learning blockchain network and consensus algorithm [O] . Tsung-Ting Kuo 2020

机译：分布式预测建模框架的解剖学：在线学习区块链和共识算法
7. 1Distributed Online Learning via Cooperative Contextual Bandits [O] . Cem Tekin, Mihaela Van Der Schaar 2016

机译：1通过合作语境匪徒进行在线学习

Distributed Online Learning via Cooperative Contextual Bandits

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅