首页> 外文OA文献 >Performance of a parallel matrix multiplication routine on Intel iPSC/860
【2h】

Performance of a parallel matrix multiplication routine on Intel iPSC/860

机译:Intel iPSC / 860上并行矩阵乘法例程的性能

摘要

The performance of a parallel matrix-matrix-multiplication routine with the same functionality as DGEMM of BLAS3 was tested for different numbers of nodes on a 32-node iPSC/860. The routine was then tunned for maximum performance on this particular computer system. Small changes in the original code lead to substantially higher performance and in all tested configurations there is a critical matrix size n≈50·np, the number of processor, above which Intel's non-blocking isend is more efficient than the blocking csend. This shows that special tuning for a single machine pays off for large matrices.
机译:在32节点iPSC / 860上针对不同数目的节点,测试了具有与BLAS3的DGEMM相同功能的并行矩阵-矩阵乘法例程的性能。然后,对该例程进行了调整,以在此特定的计算机系统上获得最佳性能。原始代码的微小变化会导致性能显着提高,并且在所有经过测试的配置中,关键的矩阵大小n≈50·np(处理器数量),高于此值,英特尔的非阻塞式发送将比阻塞式发送更有效。这表明,对一台机器进行特殊的调整可以为大型矩阵带来回报。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号