首页> 外文期刊>IEEE Transactions on Information Theory >Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding
【24h】

Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding

机译:分布式矩阵乘法中的减少流浪者:基本极限和最优编码

获取原文
获取原文并翻译 | 示例

摘要

We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers' delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named entangled polynomial code, for designing the intermediate computations at the worker nodes in order to minimize the recovery threshold (i.e., the number of workers that we need to wait for in order to compute the final output). We demonstrate the optimality of entangled polynomial code in several cases, and show that it provides orderwise improvement over the conventional schemes for straggler mitigation. Furthermore, we characterize the optimal recovery threshold among all linear coding strategies within a factor of 2 using bilinear complexity, by developing an improved version of the entangled polynomial code. In particular, while evaluating bilinear complexity is a well-known challenging problem, we show that optimal recovery threshold for linear coding strategies can be approximated within a factor of 2 of this fundamental quantity. On the other hand, the improved version of the entangled polynomial code enables further and orderwise reduction in the recovery threshold, compared to its basic version. Finally, we show that the techniques developed in this paper can also be extended to several other problems such as coded convolution and fault-tolerant computing, leading to tight characterizations.
机译:我们考虑在包含一组工作节点的大规模分布式系统中,海量矩阵乘法的问题,该问题是许多数据分析应用程序的基础。我们针对散乱者的延迟性能瓶颈,这是由于等待最慢的节点(或散乱者)完成任务时出现不可预知的延迟。我们提出了一种称为纠缠多项式代码的新颖编码策略,用于在工作节点上设计中间计算,以最小化恢复阈值(即为了计算最终输出而需要等待的工作人员数量)。我们证明了在某些情况下纠缠多项式代码的最优性,并表明它提供了比常规方案更好的有序缓解措施。此外,我们通过开发纠缠多项式代码的改进版本,在使用双线性复杂度的所有线性编码策略中,以2为因数来表征最佳恢复阈值。特别是,虽然评估双线性复杂度是一个众所周知的挑战性问题,但我们表明,线性编码策略的最佳恢复阈值可以在此基本数量的2倍内近似。另一方面,与基本版本相比,纠缠多项式代码的改进版本可以进一步有序地降低恢复阈值。最后,我们证明了本文开发的技术还可以扩展到其他几个问题,例如编码卷积和容错计算,从而实现了严格的表征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号