【24h】

Peer-to-Peer Multi-class Boosting

机译:点对点多级提升

获取原文

摘要

We focus on the problem of data mining over large-scale fully distributed databases, where each node stores only one data record. We assume that a data record is never allowed to leave the node it is stored at. Possible motivations for this assumption include privacy or a lack of a centralized infrastructure. To tackle this problem, earlier we proposed the generic gossip learning framework (GoLF), but so far we have studied only basic linear algorithms. In this paper we implement the well-known boosting technique in GoLF. Boosting techniques have attracted growing attention in machine learning due to their outstanding performance in many practical applications. Here, we present an implementation of a boosting algorithm that is based on FILTERBOOST. Our main algorithmic contribution is a derivation of a pure online multi-class version of FILTERBOOST, so that it can be employed in GoLF. We also propose improvements to GoLF, with the aim of maximizing the diversity of the evolving models gossiped in the network, a feature that we show to be important. We evaluate the robustness and the convergence speed of the algorithm empirically over three benchmark databases. We compare the algorithm with the sequential ADABOOST algorithm and we test its performance in a failure scenario involving message drop and delay, and node churn.
机译:我们专注于大规模完全分布式数据库的数据挖掘问题,其中每个节点仅存储一个数据记录。我们假设从未允许数据记录离开它存储的节点。对此假设的可能动机包括隐私或缺乏集中式基础设施。为了解决这个问题,我们之前提出了通用八卦学习框架(高尔夫),但到目前为止我们只研究了基本的线性算法。在本文中,我们在高尔夫中实施了众所周知的促进技术。由于它们在许多实际应用中的出色表现出色,因此提升技术引起了机器学习的日益关注。这里,我们介绍了一种基于过滤器的升压算法的实现。我们的主要算法贡献是纯粹的在线多级过滤器的推导,因此它可以在高尔夫中使用。我们还提出改善高尔夫,目的是最大化网络中不断发展的模型的多样性,这是我们表现出很重要的特征。我们在三个基准数据库中评估了算法的稳健性和收敛速度。我们将算法与顺序Adaboost算法进行比较,我们在涉及消息丢弃和延迟的故障方案中测试其性能,以及节点搅拌。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号