首页> 外文期刊>JMLR: Workshop and Conference Proceedings >DRACO: Byzantine-resilient Distributed Training via Redundant Gradients
【24h】

DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

机译:DRACO:通过冗余渐变拜占庭式的分布式训练

获取原文
获取外文期刊封面目录资料

摘要

Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the geometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur a prohibitive computational overhead in large-scale settings, and their convergence guarantees often require strong assumptions. In this work, we present DRACO, a scalable framework for robust distributed training that uses ideas from coding theory. In DRACO, each compute node evaluates redundant gradients that are used by the parameter server to eliminate the effects of adversarial updates. DRACO comes with problem-independent robustness guarantees, and the model that it trains is identical to the one trained in the adversary-free setup. We provide extensive experiments on real datasets and distributed setups across a variety of large-scale models, where we show that DRACO is several times, to orders of magnitude faster than median-based approaches.
机译:分布式模型培训易受拜占庭系统故障和对抗的计算节点,即使用恶意更新以破坏存储在参数服务器(PS)的全局模型的节点。为了保证某种形式的鲁棒性,最近的工作表明,使用几何中位数作为聚合规则的变体,代替梯度平均。不幸的是,基于中位数的规则可以在大规模设置中产生禁止的计算开销,并且它们的融合保证通常需要强烈的假设。在这项工作中,我们呈现DRACO,一个可扩展的框架,用于强大的分布式培训,其使用来自编码理论的想法。在Draco中,每个计算节点都会评估参数服务器使用的冗余渐变,以消除对抗性更新的影响。 DRACO具有独立问题的鲁棒性保证,而IT列车的模型与在侵犯的对抗设置中训练的模型相同。我们在各种大型模型上提供了关于实际数据集和分布式设置的广泛实验,在那里我们表明Draco是几次,比基于中位数的方法更快的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号