首页> 外文期刊>Journal of Parallel and Distributed Computing >Communication lower bounds for distributed-memory matrix multiplication
【24h】

Communication lower bounds for distributed-memory matrix multiplication

机译:分布式内存矩阵乘法的通信下限

获取原文
获取原文并翻译 | 示例
           

摘要

We present lower bounds on the amount of communication that matrix multiplication algorithms must perform on a distributed-memory parallel computer. We denote the number of processors by P and the dimension of square matrices by n. We show that the most widely used class of algorithms, the so-called two-dimensional (2D) algorithms, are optimal, in the sense that in any algorithm that only uses O(n(2)/p) words of memory per processor, at least one processor must send or receive Omega(n(2)/p(1/2)) words. We also show that algorithms from another class, the so-called three-dimensional (3D) algorithms, are also optimal. These algorithms use replication to reduce communication. We show that in any algorithm that uses Omega(n(2)/p(2/3)) words of memory per processor, at least one processor must send or receive Omega(n(2)/p(2/3)) words. Furthermore, we show a continuous tradeoff between the size of local memories and the amount of communication that must be performed. The 2D and 3D bounds are essentially instantiations of this tradeoff. We also show that if the input is distributed across the local memories of multiple nodes without replication, then Omega(n(2)) words must cross any bisection cut of the machine. All our bounds apply only to conventional o(n(3)) algorithms. They do not apply to Strassen's algorithm or other Theta(n(3)) algorithms. (C) 2004 Elsevier Inc. All rights reserved.
机译:我们给出了矩阵乘法算法必须在分布式内存并行计算机上执行的通信量的下限。我们用P表示处理器数量,用n表示平方矩阵的维数。我们表明,在每个处理器仅使用O(n(2)/ p)个内存字的任何算法中,最广泛使用的一类算法(所谓的二维(2D)算法)是最佳的,至少一个处理器必须发送或接收Omega(n(2)/ p(1/2))个字。我们还显示了另一类算法,即所谓的三维(3D)算法,也是最佳的。这些算法使用复制来减少通信。我们表明,在每个处理器使用Omega(n(2)/ p(2/3))内存字的任何算法中,至少一个处理器必须发送或接收Omega(n(2)/ p(2/3))话。此外,我们显示了本地内存大小和必须执行的通信量之间的持续权衡。 2D和3D边界本质上是此折衷的实例。我们还表明,如果输入分布在多个节点的本地内存中而不进行复制,则Omega(n(2))字必须越过机器的任何二等分。我们所有的边界仅适用于常规o(n(3))算法。它们不适用于Strassen算法或其他Theta(n(3))算法。 (C)2004 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号