DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

Lingjiao Chen; Hongyi Wang; Zachary Charles; Dimitris Papailiopoulos

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

【24h】

DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

机译：DRACO：通过冗余渐变拜占庭式的分布式训练

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the geometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur a prohibitive computational overhead in large-scale settings, and their convergence guarantees often require strong assumptions. In this work, we present DRACO, a scalable framework for robust distributed training that uses ideas from coding theory. In DRACO, each compute node evaluates redundant gradients that are used by the parameter server to eliminate the effects of adversarial updates. DRACO comes with problem-independent robustness guarantees, and the model that it trains is identical to the one trained in the adversary-free setup. We provide extensive experiments on real datasets and distributed setups across a variety of large-scale models, where we show that DRACO is several times, to orders of magnitude faster than median-based approaches.

机译：分布式模型培训易受拜占庭系统故障和对抗的计算节点，即使用恶意更新以破坏存储在参数服务器（PS）的全局模型的节点。为了保证某种形式的鲁棒性，最近的工作表明，使用几何中位数作为聚合规则的变体，代替梯度平均。不幸的是，基于中位数的规则可以在大规模设置中产生禁止的计算开销，并且它们的融合保证通常需要强烈的假设。在这项工作中，我们呈现DRACO，一个可扩展的框架，用于强大的分布式培训，其使用来自编码理论的想法。在Draco中，每个计算节点都会评估参数服务器使用的冗余渐变，以消除对抗性更新的影响。 DRACO具有独立问题的鲁棒性保证，而IT列车的模型与在侵犯的对抗设置中训练的模型相同。我们在各种大型模型上提供了关于实际数据集和分布式设置的广泛实验，在那里我们表明Draco是几次，比基于中位数的方法更快的速度。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共10页
作者
Lingjiao Chen; Hongyi Wang; Zachary Charles; Dimitris Papailiopoulos;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Data Encoding for Byzantine-Resilient Distributed Optimization [J] . Deepesh Data, Linqi Song, Suhas N. Diggavi IEEE Transactions on Information Theory . 2021,第2期

机译：拜占庭式弹性分布式优化的数据编码
2. Finite-Time Guarantees for Byzantine-Resilient Distributed State Estimation With Noisy Measurements [J] . Su Lili, Shahrampour Shahin IEEE Transactions on Automatic Control . 2020,第9期

机译：具有嘈杂测量的拜占庭式弹性分布式状态估计的有限时间保证
3. Byzantine-resilient distributed observers for LTI systems [J] . Mitra Aritra, Sundaram Shreyas Automatica . 2019,第期

机译：用于LTI系统的拜占庭式弹性分布式观察者
4. Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach [C] . Haibo Yang, Xin Zhang, Minghong Fang, IEEE Annual Conference on Decision and Control . 2019

机译：分布式学习的拜占庭弹性随机梯度下降：Lipschitz激发坐标 - 明智的中位数
5. Using a distributed reactive algorithm to control an arbitrary number of collaborative hyper-redundant serial manipulators in real time [D] . Fawcett, Thomas T., III. 2012

机译：使用分布式无功算法实时控制任意数量的协作超冗余串行机械手
6. A Partition Based Gradient Compression Algorithm for Distributed Training in AIoT [O] . Bingjun Guo, Yazhi Liu, Chunyang Zhang 2021

机译：基于分区的AIT分布式训练的分区梯度压缩算法
7. Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach [O] . Haibo Yang, Xin Zhang, Minghong Fang, 2019

机译：分布式学习的拜占庭弹性随机梯度下降：Lipschitz激发坐标 - 明智的中位数

DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅