Hybrid CPU-GPU Generation of the Hamiltonian and Overlap Matrices in FLAPW Methods

机译：混合CPU-GPU在PLAPW方法中产生Hamiltonian和重叠矩阵

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper we focus on the integration of high-performance numerical libraries in ab initio codes and the portability of performance and scalability. The target of our work is FLEUR, a software for electronic structure calculations developed in the Forschungszentrum Julich over the course of two decades. The presented work follows up on a previous effort to modernize legacy code by re-engineering and rewriting it in terms of highly optimized libraries. We illustrate how this initial effort to get efficient and portable shared-memory code enables fast porting of the code to emerging heterogeneous architectures. More specifically, we port the code to nodes equipped with multiple GPUs. We divide our study in two parts. First, we show considerable speedups attained by minor and relatively straightforward code changes to off-load parts of the computation to the GPUs. Then, we identify further possible improvements to achieve even higher performance and scalability. On a system consisting of 16-cores and 2 GPUs, we observe speedups of up to 5× with respect to our optimized shared-memory code, which in turn means between 7.5× and 12.5× speedup with respect to the original FLEUR code.

机译：在本文中，我们专注于在AB Initio代码中的高性能数值库的集成以及性能和可扩展性的可移植性。我们工作的目标是Fleur，在二十年内在Forschungszentrum Julich开发的电子结构计算软件。通过重新设计和重写了高度优化的图书馆，提出了以往的努力，以便通过重新设计和重写遗留码。我们说明了实现高效和便携式共享内存代码的最初努力如何快速将代码移植到新出现的异构架构。更具体地说，我们将代码端口到配备多个GPU的节点。我们将我们的研究分为两部分。首先，我们显示通过次要和相对简单的代码更改为GPU的计算部分的次要和相对简单的代码更改的相当大的加速。然后，我们确定进一步的可能改进，以实现更高的性能和可扩展性。在由16个核和2个GPU组成的系统上，我们观察到我们优化的共享存储器代码的高达5倍的加速度，这又在7.5×和12.5×相对于原始Fleur码的加速。

著录项

来源
《JARA High-Performance Computing Symposium》|2017年|258p|共12页
会议地点
作者
Diego Fabregat-Traver; Davor Davidovic; Markus Hohnerbach; Edoardo Di Napoli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301-53;
关键词
DFT; High-performance computing; Performance portability; Heterogeneous architectures; FLAPW method; FLEUR;

机译：DFT;高性能计算;性能便携性;异构架构;襟翼方法;幻想;

相似文献

外文文献
中文文献
专利

1. A new era in scientific computing: Domain decomposition methods in hybrid CPU-GPU architectures [J] . M. Papadrakakis, G. Stavroulakis, A. Karatarakis Computer Methods in Applied Mechanics and Engineering . 2011,第13a16期

机译：科学计算的新纪元：混合CPU-GPU架构中的域分解方法
2. An effective model hamiltonian for heteroatom-containing molecules reflecting the structures of both overlap and one-electron density matrices. The ammonia and water molecules [J] . K. Konstantinavieius Journal of Molecular Structure. Theochem: Applications of Theoretical Chemistry to Organic, Inorganic and Biological Problems . 1997,第0期

机译：一个有效的模型哈密尔顿模型，用于反映杂原子的分子，该分子反映了重叠结构和单电子密度矩阵。氨和水分子
3. Hybrid port-Hamiltonian systems: From parameterized incidence matrices to hybrid automata [J] . Valentin C, Magos M, Maschke B Nonlinear Analysis: An International Multidisciplinary Journal . 2006,第6期

机译：混合端口哈密顿系统：从参数化入射矩阵到混合自动机
4. Hybrid CPU-GPU Generation of the Hamiltonian and Overlap Matrices in FLAPW Methods [C] . Diego Fabregat-Traver, Davor Davidovic, Markus Hohnerbach, JARA High-Performance Computing Symposium . 2017

机译：混合CPU-GPU在PLAPW方法中产生Hamiltonian和重叠矩阵
5. PDE solvers for hybrid CPU-GPU architectures [D] . Malahe, Michael. 2016

机译：用于混合CPU-GPU架构的PDE求解器
6. A separable shadow Hamiltonian hybrid Monte Carlo method [O] . Christopher R. Sweet, Scott S. Hampton, Robert D. Skeel, -1

机译：可分离阴影哈密顿混合蒙特卡罗方法
7. Hybrid CPU-GPU generation of the Hamiltonian and overlap matrices in FLAPW methods [O] . Fabregat-Traver Diego, Davidović Davor, Höhnerbach Markus, 2017

机译：FLAPW方法中哈密顿量和重叠矩阵的混合CPU-GPU生成
8. Atomic Spectral Methods for Molecular Electronic Structure Calculations: Atomic-Pair Representations of Aggregate Hamiltonian Matrices (Preprint) [R] . Langhoff, P. W. , Hinde, R. J. , Mills, J. D. , 2007

机译：分子电子结构计算的原子光谱方法：聚合哈密顿矩阵的原子对表示（预印本）

Hybrid CPU-GPU Generation of the Hamiltonian and Overlap Matrices in FLAPW Methods

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅