首页> 美国政府科技报告 >Parallel matrix transpose algorithms on distributed memory concurrent computers
【24h】

Parallel matrix transpose algorithms on distributed memory concurrent computers

机译:分布式内存并发计算机上的并行矩阵转置算法

获取原文

摘要

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P (times) Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C = A (center dot) B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A(sup T) (center dot) B(sup T), in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号