MPI vs Fortran coarrays beyond 100k cores: 3D cellular automata

Shterenlikht Anton; Cebamanos Luis

首页> 外文期刊>Parallel Computing >MPI vs Fortran coarrays beyond 100k cores: 3D cellular automata

【24h】

MPI vs Fortran coarrays beyond 100k cores: 3D cellular automata

机译：超过10万核心的MPI与Fortran协同阵列：3D细胞自动机

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Fortran coarrays are an attractive alternative to MPI due to a familiar Fortran syntax, single sided communications and implementation in the compiler. Scaling of coarrays is compared in this work to MPI, using cellular automata (CA) 3D Ising magnetisation miniapps, built with the CASUP CA library, https://cgpack.sourceforge.io, developed by the authors. Ising energy and magnetisation were calculated with MPI_ALLREDUCE and Fortran 2018 co_sum collectives. The work was done on ARCHER (Cray XC30) up to the full machine capacity: 109,056 cores. Ping-pong latency and bandwidth results are very similar with MPI and with coarrays for message sizes from 1B to several MB. MPI halo exchange (HX) scaled better than coarray HX, which is surprising because both algorithms use pair-wise communications: MPI IRECV/ISEND/WAITALL vs Fortran sync images. Adding OpenMP to MPI or to coarrays resulted in worse L2 cache hit ratio, and lower performance in all cases, even though the NUMA effects were ruled out. This is likely because the CA algorithm is network bound at scale. This is further evidenced by the fact that very aggressive cache and inter-procedural optimisations lead to no performance gain. The sampling and tracing analysis shows good load balancing in compute in all miniapps, but imbalance in communication, indicating that the difference in performance between MPI and coarrays is likely due to parallel libraries (MPICH2 vs libpgas) and the Cray hardware specific libraries (uGNI vs DMAPP). Overall, the results look promising for coarray use beyond 100k cores. However, further coarray optimisation is needed to narrow the performance gap between coarrays and MPI. (C) 2019 Elsevier B.V. All rights reserved.

机译：由于熟悉的Fortran语法，单面通信和在编译器中的实现，Fortran协数组是MPI的有吸引力的替代方案。在这项工作中，使用由作者开发的CASUP CA库https://cgpack.sourceforge.io构建的细胞自动机（CA）3D Ising磁化微型应用程序，将共阵列的缩放比例与MPI进行了比较。 Ising能量和磁化强度由MPI_ALLREDUCE和Fortran 2018 co_sum集合计算得出。这项工作是在ARCHER（Cray XC30）上完成的，直至整个机器容量：109,056核。对于MPI和消息大小从1B到几MB的协同阵列，乒乓延迟和带宽结果非常相似。 MPI光环交换（HX）的扩展性优于共阵列HX，这令人惊讶，因为两种算法都使用成对通信：MPI IRECV / ISEND / WAITALL与Fortran同步图像。即使排除了NUMA影响，将OpenMP添加到MPI或共阵列也会导致更差的L2缓存命中率，并且在所有情况下均会降低性能。这很可能是因为CA算法是大规模网络绑定的。事实证明，非常积极的缓存和过程间优化不会导致性能提升。采样和跟踪分析显示，所有miniapp的计算均具有良好的负载平衡，但通信不平衡，这表明MPI和协同阵列之间的性能差异可能是由于并行库（MPICH2与libpgas）和Cray硬件特定库（uGNI与DMAPP）。总体而言，对于超过100k内核的协同阵列使用，结果看起来很有希望。但是，需要进一步的协阵列优化以缩小协阵列和MPI之间的性能差距。（C）2019 Elsevier B.V.保留所有权利。

著录项

来源
《Parallel Computing》 |2019年第5期|37-49|共13页
作者
Shterenlikht Anton; Cebamanos Luis;
展开▼
作者单位

Univ Bristol, Mech Engn Dept, Univ Walk, Bristol BS8 1TR, Avon, England;

Univ Edinburgh, EPCC, James Clerk Maxwell Bldg,Peter Guthrie Tait Rd, Edinburgh EH9 3FD, Midlothian, Scotland;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Fortran coarrays; MPI; Strong scaling; Cray; Profiling; Cellular automata;

机译：Fortran协同阵列;MPI;强缩放;Cray;分析;细胞自动机;

相似文献

外文文献
中文文献
专利

1. MPI vs Fortran coarrays beyond 100k cores: 3D cellular automata [J] . Shterenlikht Anton, Cebamanos Luis Parallel Computing . 2019,第May期

机译：MPI VS FORTRAN COARRAYS超过100K核心：3D蜂窝自动机
2. Modelling fracture in heterogeneous materials on HPC systems using a hybrid MPI/Fortran coarray multi-scale CAFE framework [J] . Shterenlikht A., Margetts L., Cebamanos L. Advances in Engineering Software . 2018,第NOVa期

机译：使用MPI / Fortran混合阵列多尺度CAFE框架在HPC系统上对异质材料中的裂缝建模
3. MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes [J] . Sharma Anuj, Moulitsas Irene Scientific programming . 2017,第PTa2期

机译：MPI到Coarray Fortran：用于非结构化网格的CFD解算器的经验
4. A Comparison of MPI/OpenMP and Coarray Fortran for Digital Rock Physics Application [C] . Galina Reshetova, Vladimir Cheverda, Tatyana Khachkova International Conference on Parallel Computing Technologies . 2019

机译：MPI / OpenMP与Coarray Fortran在数字岩石物理应用中的比较
5. Parallel MPI/FORTRAN finite element symmetrical/unsymmetrical Domain Decomposition [D] . Tungkahotara, Siroj 2008

机译：并行MPI / FORTRAN有限元对称/不对称域分解
6. Twenty-fold acceleration of 3D projection reconstruction MPI [O] . Justin J. Konkle, Patrick W. Goodwill, Emine Ulku Saritas, -1

机译：3D投影重建MPI的二十倍加速
7. MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes [O] . Anuj Sharma, Irene Moulitsas 2017

机译：MPI到Coarray Fortran：对非结构化网格的CFD求解器的经验

MPI vs Fortran coarrays beyond 100k cores: 3D cellular automata

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅