首页> 外文会议>IEEE International Conference on Data Science and Advanced Analytics >Learning better while sending less: Communication-efficient online semi-supervised learning in client-server settings
【24h】

Learning better while sending less: Communication-efficient online semi-supervised learning in client-server settings

机译:在发送更少的同时更好地学习:在客户端-服务器设置中进行高效通信的在线半监督学习

获取原文

摘要

We consider a novel distributed learning problem: A server receives potentially unlimited data from clients in a sequential manner, but only a small initial fraction of these data are labeled. Because communication bandwidth is expensive, each client is limited to sending the server only a small (high-priority) fraction of the unlabeled data it generates, and the server is limited in the amount of prioritization hints it sends back to the client. The goal is for the server to learn a good model of all the client data from the labeled and unlabeled data it receives. This setting is frequently encountered in real-world applications and has the characteristics of online, semi-supervised, and active learning. However, previous approaches are not designed for the client-server setting and do not hold the promise of reducing communication costs. We present a novel framework for solving this learning problem in an effective and communication-efficient manner. On the server side, our solution combines two diverse learners working collaboratively, yet in distinct roles, on the partially labeled data stream. A compact, online graph-based semi-supervised learner is used to predict labels for the unlabeled data arriving from the clients. Samples from this model are used as ongoing training for a linear classifier. On the client side, our solution prioritizes data based on an active-learning metric that favors instances that are close to the classifier's decision hyperplane and yet far from each other. To reduce communication, the server sends the classifier's weight-vector to the client only periodically. Experimental results on real-world data sets show that this particular combination of techniques outperforms other approaches, and in particular, often outperforms (communication expensive) approaches that send all the data to the server.
机译:我们考虑一个新颖的分布式学习问题:服务器以顺序的方式从客户端接收可能不受限制的数据,但是这些数据中只有一小部分被标记。由于通信带宽昂贵,因此每个客户端只能将其生成的未标记数据的一小部分(高优先级)发送给服务器,并且服务器的优先级提示数量也将受到限制,服务器会将其发送回客户端。服务器的目标是从服务器接收到的带标签和未带标签的数据中学习所有客户端数据的良好模型。此设置在现实世界的应用程序中经常遇到,并且具有在线,半监督和主动学习的特征。但是,以前的方法不是为客户端-服务器设置而设计的,并且没有降低通信成本的希望。我们提出了一种新颖的框架,以有效和高效沟通的方式解决这一学习问题。在服务器端,我们的解决方案结合了两个不同的学习者,他们在部分标记的数据流上协同工作,但角色不同。一个紧凑的,基于图的在线半监督学习器用于预测来自客户端的未标记数据的标记。该模型的样本用作线性分类器的持续训练。在客户端,我们的解决方案基于主动学习指标对数据进行优先级排序,该指标支持偏向于分类器决策超平面但彼此之间相距较远的实例。为了减少通信,服务器仅定期将分类器的权重向量发送给客户端。实际数据集上的实验结果表明,这种特定的技术组合胜过其他方法,尤其是经常优于将所有数据发送到服务器的方法(通信开销大)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号