...
首页> 外文期刊>Performance evaluation review >Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent
【24h】

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

机译:对抗环境下的分布式统计机器学习:拜占庭式梯度下降

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized system that consists of a parameter server and m working machines; each working machine keeps N/m data samples, where N is the total number of samples. In each iteration, up to q of the m working machines suffer Byzantine faults - a faulty machine in the given iteration behaves arbitrarily badly against the system and has complete knowledge of the system. Additionally, the sets of faulty machines may be different across iterations. Our goal is to design robust algorithms such that the system can learn the underlying true parameter, which is of dimension d, despite the interruption of the Byzantine attacks. In this paper, based on the geometric median of means of the gradients, we propose a simple variant of the classical gradient descent method. We show that our method can tolerate q Byzantine failures up to 2(1 + є)q < m for an arbitrarily small but fixed constant є > 0. The parameter estimate converges in o(log N) rounds with an estimation error on the order of max{√(dq/N), √(d/N)}, which is larger than the minimax-optimal error rate √(d/N) in the centralized and failure-free setting by at most a factor of √(q). The total computational complexity of our algorithm is of O((Nd/m) log N) at each working machine and O(md + kd log~3 JV) at the central server, and the total communication cost is of O(md log N). We further provide an application of our general results to the linear regression problem. A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. To handle this issue in the analysis, we prove that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.
机译:我们考虑在分散式系统上的分布式统计学习问题,该系统很容易受到对抗性攻击。此设置出现在许多实际应用中,包括Google的联合学习。形式上,我们专注于由参数服务器和m台工作机组成的分散系统;每个工作机保留N / m个数据样本,其中N是样本总数。在每次迭代中,多达m台工作机器中的q台遭受拜占庭式故障-给定迭代中的故障机器对系统的表现是任意恶劣的,并且具有完整的系统知识。另外,故障机器的集合在迭代之间可能是不同的。我们的目标是设计鲁棒的算法,以使系统尽管中断拜占庭式攻击也可以学习维数为d的基本真实参数。在本文中,基于梯度均值的几何中值,我们提出了经典梯度下降方法的一个简单变体。我们表明,对于任意小但固定常数є> 0的方法,我们的方法可以容忍q拜占庭式故障,最大2(1 +є)q

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号