首页> 外文期刊>Computer science >Scalable parallel AMG on ccNUMA machines with OpenMP
【24h】

Scalable parallel AMG on ccNUMA machines with OpenMP

机译:带有OpenMP的ccNUMA计算机上的可扩展并行AMG

获取原文
获取原文并翻译 | 示例
       

摘要

In many numerical simulation codes the backbone of the application covers the solution of linear systems of equations. Often, being created via a discretization of differential equations, the corresponding matrices are very sparse. One popular way to solve these sparse linear systems are multigrid methods-in particular AMG-because of their numerical scalability. But looking at modern multi-core architectures, also the parallel scalability has to be taken into account. With the memory bandwidth usually being the bottleneck of sparse matrix operations these linear solvers can't always benefit from increasing numbers of cores. To exploit the available aggregated memory bandwidth on larger scale NUMA machines evenly distributed data is often more an issue than load balancing. Additionally, using a threading model like OpenMP, one has to ensure the data locality manually by explicit placement of memory pages. On non uniform data it is always a tradeoff between these three principles, while the ideal strategy is strongly machine- and application dependent. In this paper we want to present some benchmarks of an AMG implementation based on a new performance library. Main focus is on the comparability to state-of-the-art solver packages regarding sequential performance as well as parallel scalability on common NUMA machines. To maximize throughput on standard model problems, several thread and memory configurations have been evaluated. We will show that even on large scale multi-core architectures easy parallel programming models, like OpenMP, can achieve a competitive performance compared to more complex programming models.
机译:在许多数值模拟代码中,应用程序的主旨涵盖了线性方程组的解决方案。通常是通过微分方程的离散化创建的,相应的矩阵非常稀疏。解决这些稀疏线性系统的一种流行方法是多重网格方法,尤其是AMG,因为它们的数值可扩展性。但是,考虑到现代多核体系结构,还必须考虑并行可伸缩性。由于内存带宽通常是稀疏矩阵运算的瓶颈,因此这些线性求解器无法始终受益于内核数量的增加。为了在较大规模的NUMA计算机上利用可用的聚合内存带宽,均匀分布的数据通常比负载平衡更成问题。另外,使用像OpenMP这样的线程模型,必须通过显式放置内存页面来手动确保数据的局部性。在非统一数据上,这始终是这三个原则之间的折衷,而理想的策略在很大程度上取决于机器和应用程序。在本文中,我们要介绍基于新性能库的AMG实现的一些基准。主要关注点是与最新求解器程序包的可比性,它们在顺序性能以及常见NUMA机器上的并行可伸缩性方面具有可比性。为了最大程度地提高标准模型问题的吞吐量,已评估了几种线程和内存配置。我们将证明,即使在大规模多核体系结构上,与更复杂的编程模型相比,简单的并行编程模型(如OpenMP)也可以实现具有竞争力的性能。

著录项

  • 来源
    《Computer science》 |2011年第4期|p.221-228|共8页
  • 作者

    Malte Forster; Jiri Kraus;

  • 作者单位

    Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Schloss Birlinghoven, 53754 Sankt Augustin, Germany;

    Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Schloss Birlinghoven, 53754 Sankt Augustin, Germany;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    LAMA; AMG; OpenMP; ccNUMA; first touch; PETSc; hypre;

    机译:喇嘛;AMG;OpenMP;ccNUMA;第一次接触PETSc;Hypre;
  • 入库时间 2022-08-17 13:50:29

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号