首页> 外文期刊>Concurrency and computation: practice and experience >Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures
【24h】

Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures

机译:多核架构的分布式对称稀疏矩阵矢量乘法算法性能分析

获取原文
获取原文并翻译 | 示例

摘要

Sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performancerncomputing applications. Due to its low arithmetic intensity, several approaches have been proposed in literaturernto improve its scalability and efficiency in large scale computations. In this paper, our target systemsrnare high end multi-core architectures and we use messaging passing interface + open multiprocessing hybridrnprogramming model for parallelism. We analyze the performance of recently proposed implementation ofrnthe distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in abrninitio nuclear structure calculations. We study important features of this implementation and compare withrnpreviously reported implementations that do not exploit underlying symmetry. Our SpMVM implementationsrnleverage the hybrid paradigm to efficiently overlap expensive communications with computations. Ourrnmain comparison criterion is the ‘CPU core hours’ metric, which is the main measure of resource usagernon supercomputers. We analyze the effects of topology-aware mapping heuristic using simplified networkrnload model. We have tested the different SpMVM implementations on two large clusters with 3D Torusrnand Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrixrnsymmetry and hides communication yields the best value for the ‘CPU core hours’ metric and significantlyrnreduces data movement overheads.
机译:稀疏矩阵向量乘法(SpMVM)是一个重要的内核,在高性能n运算应用程序中经常出现。由于其低的算术强度,在文献中已经提出了几种方法来提高其在大规模计算中的可扩展性和效率。在本文中,我们的目标系统是高端多核体系结构,我们使用消息传递接口+开放式多处理混合编程模型来实现并行性。我们分析了最近提出的分布式对称SpMVM实施方案的性能,该模型最初是为在原子核结构计算中产生的大型稀疏对称矩阵开发的。我们研究了此实现的重要功能,并与以前未利用底层对称性的报告实现进行了比较。我们的SpMVM实现利用混合范例将昂贵的通信与计算有效地重叠。我们的主要比较标准是“ CPU核心小时数”度量标准,它是非超级计算机对资源使用情况的主要度量。我们使用简化的网络负载模型来分析拓扑感知映射启发式方法的效果。我们已经在具有3D Torusrn和Dragonfly拓扑的两个大型群集上测试了不同的SpMVM实现。我们的结果表明,利用矩阵对称性并隐藏通信的分布式SpMVM实现为“ CPU核心小时”度量提供了最佳价值,并显着减少了数据移动开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号