【24h】

Matrix Multiplication on Multidimensional Torus Networks

机译:多维环面网络上的矩阵乘法

获取原文

摘要

Blocked matrix multiplication algorithms such as Cannon's algorithm and SUMMA have a 2-dimensional communication structure. We introduce a generalized 'Split-Dimensional' version of Cannon's algorithm (SD-Cannon) with higher-dimensional and bidirectional communication structure. This algorithm is useful for torus interconnects that can achieve more injection bandwidth than single-link bandwidth. On a bidirectional torus network of dimension d, SD-Cannon can lower the algorithmic bandwidth cost by a factor of up to d. With rectangular collectives, SUMMA also achieves the lower bandwidth cost but has a higher latency cost. We use Charm++ virtualization to efficiently map SD-Cannon on unbalanced and odd-dimensional torus network partitions. Our performance study on Blue Gene/P demonstrates that a MPI version of SD-Cannon can exploit multiple communication links and improve performance.
机译:诸如Cannon算法和SUMMA之类的块矩阵乘法算法具有二维通信结构。我们介绍了具有较高维度和双向通信结构的Cannon算法(SD-Cannon)的广义“分维”版本。该算法对于可以实现比单链路带宽更多的注入带宽的环型互连很有用。在尺寸为d的双向环面网络上,SD-Cannon可以将算法带宽成本降低多达d倍。对于矩形集合体,SUMMA还可以实现较低的带宽成本,但具有较高的延迟成本。我们使用Charm ++虚拟化技术将SD-Cannon有效地映射到不平衡和奇数维环形网络分区上。我们对Blue Gene / P的性能研究表明,MPI版本的SD-Cannon可以利用多个通信链接并提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号