首页> 外文期刊>Parallel Computing >Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs
【24h】

Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs

机译:标量多核CPU上的并行对称稀疏矩阵矢量积

获取原文
获取原文并翻译 | 示例

摘要

We present a massively parallel implementation of symmetric sparse matrix-vector product for modern clusters with scalar multi-core CPUs. Matrices with highly variable structure and density arising from unstructured three-dimensional FEM discretizations of mechanical and diffusion problems are studied. A metric of the effective memory bandwidth is introduced to analyze the impact on performance of a set of simple, well-known optimizations: matrix reordering, manual prefetching, and blocking. A modification to the CRS storage improving the performance on multi-core Opterons is shown. The performance of an entire SMP blade rather than the per-core performance is optimized. Even for the simplest 4 node mechanical element our code utilizes close to 100% of the per-blade available memory bandwidth. We show that reducing the storage requirements for symmetric matrices results in roughly two times speedup. Blocking brings further storage savings and a proportional performance increase. Our results are compared to existing state-of-the-art implementations of SpMV, and to the dense BLAS2 performance. Parallel efficiency on 5400 Opteron cores of the Cray XT4 cluster is around 80-90% for problems with approximately 25~3 mesh nodes per core. For a problem with 820 million degrees of freedom the code runs with a sustained performance of 5.2 TeraFLOPs, over 20% of the theoretical peak.
机译:我们为具有标量多核CPU的现代集群提供了对称稀疏矩阵矢量乘积的大规模并行实现。研究了由于机械和扩散问题的非结构化三维有限元离散化而产生的结构和密度高度可变的矩阵。引入了有效内存带宽的度量标准,以分析一组简单的众所周知的优化对性能的影响:矩阵重新排序,手动预取和分块。显示了对CRS存储的修改,以改进多核皓龙的性能。优化了整个SMP刀片的性能,而不是每核的性能。即使对于最简单的4节点机械元件,我们的代码也使用了接近每刀片可用内存带宽的100%。我们表明,减少对称矩阵的存储需求会导致大约两倍的加速。阻塞带来了进一步的存储节省,并成比例地提高了性能。我们的结果与SpMV的现有最先进的实现方式以及密集的BLAS2性能进行了比较。 Cray XT4集群的5400个Opteron内核的并行效率大约为80-90%,以解决每个内核大约25〜3个网格节点的问题。对于具有8.2亿自由度的问题,代码以5.2 TeraFLOP的持续性能运行,超过理论峰值的20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号