Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors

Szustak Lukasz; Bratek Pawel

首页> 外文期刊>Experimental Mechanics >Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors

【24h】

Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors

机译：具有现代英特尔处理器的共享内存平台异构模板的性能平行编程

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work, we take up the challenge of performance portable programming of heterogeneous stencil computations across a wide range of modern shared-memory systems. An important example of such computations is the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA), the second major part of the dynamic core of the EULAG geophysical model. For this aim, we develop a set of parametric optimization techniques and four-step procedure for customization of the MPDATA code. Among these techniques are: islands-of-cores strategy, (3+1)D decomposition, exploiting data parallelism and simultaneous multithreading, data flow synchronization, and vectorization. The proposed adaptation methodology helps us to develop the automatic transformation of the MPDATA code to achieve high sustained scalable performance for all tested ccNUMA platforms with Intel processors of last generations. This means that for a given platform, the sustained performance of the new code is kept at a similar level, independently of the problem size. The highest performance utilization rate of about 41-46% of the theoretical peak, measured for all benchmarks, is provided for any of the two-socket servers based on Skylake-SP (SKL-SP), Broadwell, and Haswell CPU architectures. At the same time, the four-socket server with SKL-SP processors achieves the highest sustained performance of around 1.0-1.1 Tflop/s that corresponds to about 33% of the peak.

机译：在这项工作中，我们占据了各种现代共享内存系统的异构模板计算的性能便携式编程的挑战。这种计算的一个重要示例是多维正定的前导传输算法（MPData），eulag地球物理模型的动态核的第二主要部分。为此目的，我们开发了一组参数优化技术和用于自定义MPData代码的四步过程。这些技术是：核心群岛策略，（3 + 1）D分解，利用数据并行性和同时多线程，数据流同步和矢量化。所提出的适应方法有助于我们开发MPData代码的自动变换，以实现具有上一代英特尔处理器的所有测试的CCNUMA平台的高持续可扩展性能。这意味着对于给定的平台，新代码的持续性能被保持在类似的级别，独立于问题大小。对于所有基准测试的理论峰值的最高性能利用率约为41-46％，为基于Skylake-SP（SKL-SP），Broadwell和Haswell CPU架构的任何双套接字服务器提供了任何基准。与此同时，具有SKL-SP处理器的四个套接字服务器达到持续性能约为1.0-1.1 TFLOP / S，其对应于峰值的约33％。

著录项

来源
《Experimental Mechanics》 |2019年第3期|534-553|共20页
作者
Szustak Lukasz; Bratek Pawel;
展开▼
作者单位

Czestochowa Tech Univ Fac Mech Engn & Comp Sci Inst Comp & Informat Sci Dabrowskiego 69 PL-42201 Czestochowa Poland;

Czestochowa Tech Univ Fac Mech Engn & Comp Sci Inst Comp & Informat Sci Dabrowskiego 69 PL-42201 Czestochowa Poland;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Parallel programming; performance portability; shared-memory systems; heterogeneous stencils; EULAG model; MPDATA; code parameterization; Skylake; Knights Landing;

机译：并行编程;性能可移植性;共享记忆系统;异构模板;eulag模型;MPData;代码参数化;天窗;骑士降落;

相似文献

外文文献
中文文献
专利

1. Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors [J] . Szustak Lukasz, Bratek Pawel Experimental Mechanics . 2019,第3期

机译：使用现代Intel处理器跨共享内存平台对异构模板进行性能可移植的并行编程
2. Domain‐specific virtual processors as a portable programming and execution model for parallel computational workloads on modern heterogeneous high‐performance computing architectures [J] . Lyakh Dmitry I. International Journal of Quantum Chemistry . 2019,第12期

机译：域特定的虚拟处理器作为现代异构高性能计算架构上并行计算工作负载的便携式编程和执行模型
3. A PORTABLE PROGRAMMING INTERFACE FOR PERFORMANCE EVALUATION ON MODERN PROCESSORS [J] . S. Browne, J. Dongarra, N. Garner Experimental Mechanics . 2000,第3期

机译：用于现代处理器性能评估的便携式编程接口
4. NUMERICAL REPRODUCIBILITY, PORTABILITY AND PERFORMANCE OF MODERN PSEUDO RANDOM NUMBER GENERATORS: PRELIMINARY STUDY FOR PARALLEL STOCHASTIC SIMULATIONS USING HYBRID XEON PHI COMPUTING PROCESSORS [C] . Van Toan Dao, Hong Quang Nguyen, Lydia Maigne, European simulation and modelling conference . 2014

机译：现代伪随机数发生器的数值可再现性，可移植性和性能：使用混合氙气PHI计算处理器进行并行随机模拟的初步研究
5. Performance portability of parallel kernels on shared-memory systems. [D] . Stratton, John Andrew. 2013

机译：共享内存系统上并行内核的性能可移植性。
6. Advanced Intelligent Control through Versatile Intelligent Portable Platforms [O] . Luige Vladareanu 2020

机译：通过多功能智能便携式平台进行高级智能控制
7. A portable programming interface for performance evaluation on modern processors [O] . S. Browne, J Dongarra, N. Garner, 2000

机译：用于现代处理器性能评估的便携式编程接口
8. Shared-Memory Multiprocessor Trends and the Implications for Parallel ProgramPerformance [R] . Markatos, E. P., Leblanc, T. J. 1992

机译：共享内存多处理器趋势及其对并行programperformance的影响

Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors

摘要

著录项

相似文献

相关主题

期刊订阅