首页> 外文期刊>Journal of machine learning research >Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning
【24h】

Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning

机译:分布式优化与学习中施工缓解的冗余技术

获取原文
获取外文期刊封面目录资料

摘要

Performance of distributed optimization and learning systems is bottlenecked by “straggler” nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is “encoded” to have an over-complete representation with built-in redundancy, and the straggling nodes in the system are dynamically treated as missing, or as “erasures” at every iteration, whose loss is compensated by the embedded redundancy. For quadratic loss functions, we show that under a simple encoding scheme, many optimization algorithms (gradient descent, L-BFGS, and proximal gradient) operating under data parallelism converge to an approximate solution even when stragglers are ignored. Furthermore, we show a similar result for a wider class of convex loss functions when operating under model parallelism. The applicable classes of objectives covers several popular learning problems such as linear regression, LASSO, support vector machine, collaborative filtering, and generalized linear models including logistic regression. These convergence results are deterministic, i.e., they establish sample path convergence for arbitrary sequences of delay patterns or distributions on the nodes, and are independent of the tail behavior of the delay distribution. We demonstrate that equiangular tight frames have desirable properties as encoding matrices, and propose efficient mechanisms for encoding large-scale data. We implement the proposed technique on Amazon EC2 clusters, and demonstrate its performance over several learning problems, including matrix factorization, LASSO, ridge regression and logistic regression, and compare the proposed method with uncoded, asynchronous, and data replication strategies.
机译:分布式优化和学习系统的性能是“斯特拉格勒”节点和慢速通信链路的瓶颈,这显着延迟了计算。我们提出了一个分布式优化框架,其中数据集是“编码”,以具有内置冗余的完整表示,并且系统中的孤立节点被动态地被视为缺失,或者在每次迭代时都是“擦除”,其损失由嵌入式冗余补偿。对于二次损失函数,我们表明,在一个简单的编码方案下,即使忽略障碍物,许多在数据并行机会下运行的许多优化算法(梯度下降,L-BFG和近端梯度)也会收敛到近似解决方案。此外,我们在模型并行性下操作时,我们对更广泛的凸损函数显示了类似的结果。适用的目标课程涵盖了几个流行的学习问题,如线性回归,套索,支持向量机,协作过滤和包括逻辑回归的广义线性模型。这些收敛结果是确定性的,即,它们在节点上的延迟模式或分布的任意序列中建立样本路径会聚,并且与延迟分布的尾部行为无关。我们证明等方的紧密帧具有所需的性能作为编码矩阵,并提出了用于编码大规模数据的有效机制。我们在亚马逊EC2集群上实施了提出的技术,并在多个学习问题上展示其性能,包括矩阵分解,套索,脊回归和逻辑回归,并比较了具有未编码,异步和数据复制策略的提出方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号