首页> 外文会议>JARA High-Performance Computing Symposium >Hybrid CPU-GPU Generation of the Hamiltonian and Overlap Matrices in FLAPW Methods
【24h】

Hybrid CPU-GPU Generation of the Hamiltonian and Overlap Matrices in FLAPW Methods

机译:混合CPU-GPU在PLAPW方法中产生Hamiltonian和重叠矩阵

获取原文
获取外文期刊封面目录资料

摘要

In this paper we focus on the integration of high-performance numerical libraries in ab initio codes and the portability of performance and scalability. The target of our work is FLEUR, a software for electronic structure calculations developed in the Forschungszentrum Julich over the course of two decades. The presented work follows up on a previous effort to modernize legacy code by re-engineering and rewriting it in terms of highly optimized libraries. We illustrate how this initial effort to get efficient and portable shared-memory code enables fast porting of the code to emerging heterogeneous architectures. More specifically, we port the code to nodes equipped with multiple GPUs. We divide our study in two parts. First, we show considerable speedups attained by minor and relatively straightforward code changes to off-load parts of the computation to the GPUs. Then, we identify further possible improvements to achieve even higher performance and scalability. On a system consisting of 16-cores and 2 GPUs, we observe speedups of up to 5× with respect to our optimized shared-memory code, which in turn means between 7.5× and 12.5× speedup with respect to the original FLEUR code.
机译:在本文中,我们专注于在AB Initio代码中的高性能数值库的集成以及性能和可扩展性的可移植性。我们工作的目标是Fleur,在二十年内在Forschungszentrum Julich开发的电子结构计算软件。通过重新设计和重写了高度优化的图书馆,提出了以往的努力,以便通过重新设计和重写遗留码。我们说明了实现高效和便携式共享内存代码的最初努力如何快速将代码移植到新出现的异构架构。更具体地说,我们将代码端口到配备多个GPU的节点。我们将我们的研究分为两部分。首先,我们显示通过次要和相对简单的代码更改为GPU的计算部分的次要和相对简单的代码更改的相当大的加速。然后,我们确定进一步的可能改进,以实现更高的性能和可扩展性。在由16个核和2个GPU组成的系统上,我们观察到我们优化的共享存储器代码的高达5倍的加速度,这又在7.5×和12.5×相对于原始Fleur码的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号