...
首页> 外文期刊>Foundations and trends in communications and information theory >Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning
【24h】

Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning

机译:编码计算:减轻大型分布式计算和机器学习中的基本瓶颈

获取原文
   

获取外文期刊封面封底 >>

       

摘要

We introduce the concept of "coded computing", a novel computing paradigm that utilizes coding theory to effectively inject and leverage data/computation redundancy to mitigate several fundamental bottlenecks in large-scale distributed computing, namely communication bandwidth, straggler's (i.e., slow or failing nodes) delay, privacy and security bottlenecks. More specifically, for MapReduce based distributed computing structures, we propose the "Coded Distributed Computing" (CDC) scheme, which injects redundant computations across the network in a structured manner, such that in-network coding opportunities are enabled to substantially slash the communication load to shuffle the intermediate computation results. We prove that CDC achieves the optimal tradeoff between computation and communication, and demonstrate its impact on a wide range of distributed computing systems from cloud-based datacenters to mobile edge/fog computing platforms. Secondly, to alleviate the straggler effect that prolongs the executions of distributed machine learning algorithms, we utilize the ideas from error correcting codes to develop "Polynomial Codes" for computing general matrix algebra, and "Lagrange Coded Computing" (LCC) for computing arbitrary multivariate polynomials. The core idea of these proposed schemes is to apply coding to create redundant data/computation scattered across the network, such that completing the overall computation task only requires a subset of the network nodes returning their local computation results. We demonstrate the optimality of Polynomial Codes and LCC in minimizing the computation latency, by proving that they require the least number of nodes to return their results. Finally, we illustrate the role of coded computing in providing security and privacy in distributed computing and machine learning. In particular, we consider the problems of secure multiparty computing (MPC) and privacy-preserving machine learning, and demonstrate how coded computing can be leveraged to provide efficient solutions to these critical problems and enable substantial improvements over the state of the art. To illustrate the impact of coded computing on real world applications and systems, we implement the proposed coding schemes on cloud-based distributed computing systems, and significantly improve the run-time performance of important benchmarks including distributed sorting, distributed training of regression models, and privacy-preserving training for image classification. Throughout this monograph, we also highlight numerous open problems and exciting research directions for future work on coded computing.
机译:我们介绍了“编码计算”的概念,这是一种新颖的计算范例,利用编码理论来有效地注入和利用数据/计算冗余,以减轻大规模分布式计算中的若干基本瓶颈,即通信带宽,斯特拉格勒(即,慢速或失败节点)延迟,隐私和安全瓶颈。更具体地,对于基于映射的分布式计算结构,我们提出了“编码分布式计算”(CDC)方案,该方案以结构化的方式在网络上注入冗余计算,使得网络内编码机会能够基本上削减通信负载将中间计算结果进行洗牌。我们证明,CDC在计算和通信之间实现了最佳权衡,并展示了对从基于云的数据中心到移动边缘/雾计算平台的广泛分布式计算系统的影响。其次,为了减轻延长分布式机器学习算法的执行的争吵效果,我们利用误差校正码的思想来开发用于计算通用矩阵代数的“多项式代码”,以及用于计算任意多变量的“拉格朗加编码计算”(LCRAge Coded Computing“(LCC)多项式。这些提出的方案的核心思想是应用编码以在网络中散射的冗余数据/计算,使得完成整体计算任务仅需要返回其本地计算结果的网络节点的子集。我们通过证明它们需要最小数量的节点来返回它们的结果来证明多项式代码和LCC在最小化计算延迟时的最优性。最后,我们说明了编码计算在分布式计算和机器学习中提供安全性和隐私的作用。特别地,我们考虑安全多方计算(MPC)和隐私保存机器学习的问题,并演示如何利用编码计算以提供对这些关键问题的有效解决方案,并实现对现有技术的显着改进。为了说明编码计算对现实世界应用和系统的影响,我们在基于云的分布式计算系统上实现了所提出的编码方案,并显着提高了重要基准的运行时性能,包括分布式分类,回归模型的分布式训练,以及用于图像分类的隐私保留培训。在本专着中,我们还突出了许多打开问题和令人兴奋的研究方向,以便在编码计算上的未来工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号