MPI vs Fortran coarrays beyond 100k cores: 3D cellular automata

Shterenlikht Anton; Cebamanos Luis

首页> 外文期刊>Parallel Computing >MPI vs Fortran coarrays beyond 100k cores: 3D cellular automata

【24h】

MPI vs Fortran coarrays beyond 100k cores: 3D cellular automata

机译：MPI VS FORTRAN COARRAYS超过100K核心：3D蜂窝自动机

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Fortran coarrays are an attractive alternative to MPI due to a familiar Fortran syntax, single sided communications and implementation in the compiler. Scaling of coarrays is compared in this work to MPI, using cellular automata (CA) 3D Ising magnetisation miniapps, built with the CASUP CA library, https://cgpack.sourceforge.io, developed by the authors. Ising energy and magnetisation were calculated with MPI_ALLREDUCE and Fortran 2018 co_sum collectives. The work was done on ARCHER (Cray XC30) up to the full machine capacity: 109,056 cores. Ping-pong latency and bandwidth results are very similar with MPI and with coarrays for message sizes from 1B to several MB. MPI halo exchange (HX) scaled better than coarray HX, which is surprising because both algorithms use pair-wise communications: MPI IRECV/ISEND/WAITALL vs Fortran sync images. Adding OpenMP to MPI or to coarrays resulted in worse L2 cache hit ratio, and lower performance in all cases, even though the NUMA effects were ruled out. This is likely because the CA algorithm is network bound at scale. This is further evidenced by the fact that very aggressive cache and inter-procedural optimisations lead to no performance gain. The sampling and tracing analysis shows good load balancing in compute in all miniapps, but imbalance in communication, indicating that the difference in performance between MPI and coarrays is likely due to parallel libraries (MPICH2 vs libpgas) and the Cray hardware specific libraries (uGNI vs DMAPP). Overall, the results look promising for coarray use beyond 100k cores. However, further coarray optimisation is needed to narrow the performance gap between coarrays and MPI. (C) 2019 Elsevier B.V. All rights reserved.

机译：由于熟悉的Fortran语法，单侧通信和编译器中的实现，Fortran Coarrays是MPI的有吸引力的替代方案。将COARRAYS的缩放与MPI进行比较，使用蜂窝自动机（CA）3D ising MINIAPPS，其中包括CASUP CA库，HTTPS://cgpack.sourceForge.io，由作者开发。使用MPI_AllReduce和Fortran 2018 Co_sum集体计算了能量和磁化。这项工作是在Archer（CRAY XC30）上完成的全部机器容量：109,056核心。 Ping-Pong延迟和带宽结果与MPI非常相似，并且具有从1B到几MB的消息大小的杂志。 MPI Halo Exchange（HX）比Coarray HX更好，这令人惊讶，因为这两种算法都使用成对通信：MPI IRECV / ISEND / WASTALL VS FORTRAN SYNC图像。将OpenMP添加到MPI或COARRASE，导致L2缓存命中率更差，并且在所有情况下都较低，即使排除了NUMA效果。这可能是因为CA算法是按比例绑定的网络。这进一步证明了非常激进的缓存和程序间优化导致无性收益的事实。采样和跟踪分析显示所有MinIAPP中的计算良好负载平衡，但通信中的不平衡，表明MPI和勾勒之间的性能差异可能是由于并行库（MPICH2 VS LibPGA）和CRAY硬件特定库（UGNI VS） DMAPP）。总体而言，结果对于携带植物的核心队伍超越了100K核心。然而，需要进一步的Coarray优化来缩小杂面和MPI之间的性能差距。（c）2019 Elsevier B.v.保留所有权利。

著录项

来源
《Parallel Computing》 |2019年第5期|37-49|共13页
作者
Shterenlikht Anton; Cebamanos Luis;
展开▼
作者单位

Univ Bristol Mech Engn Dept Univ Walk Bristol BS8 1TR Avon England;

Univ Edinburgh EPCC James Clerk Maxwell Bldg Peter Guthrie Tait Rd Edinburgh EH9 3FD Midlothian Scotland;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Fortran coarrays; MPI; Strong scaling; Cray; Profiling; Cellular automata;

机译：Fortran Coarrays;MPI;强大的缩放;CRAY;分析;蜂窝自动机;

相似文献

外文文献
专利

1. MPI vs Fortran coarrays beyond 100k cores: 3D cellular automata [J] . Shterenlikht Anton, Cebamanos Luis Parallel Computing . 2019,第MAY期

机译：超过10万核心的MPI与Fortran协同阵列：3D细胞自动机
2. Modelling fracture in heterogeneous materials on HPC systems using a hybrid MPI/Fortran coarray multi-scale CAFE framework [J] . Shterenlikht A., Margetts L., Cebamanos L. Advances in Engineering Software . 2018,第NOVa期

机译：使用MPI / Fortran混合阵列多尺度CAFE框架在HPC系统上对异质材料中的裂缝建模
3. MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes [J] . Sharma Anuj, Moulitsas Irene Scientific programming . 2017,第PTa2期

机译：MPI到Coarray Fortran：用于非结构化网格的CFD解算器的经验
4. A Comparison of MPI/OpenMP and Coarray Fortran for Digital Rock Physics Application [C] . Galina Reshetova, Vladimir Cheverda, Tatyana Khachkova International Conference on Parallel Computing Technologies . 2019

机译：MPI / OpenMP与Coarray Fortran在数字岩石物理应用中的比较
5. Parallel MPI/FORTRAN finite element symmetrical/unsymmetrical Domain Decomposition [D] . Tungkahotara, Siroj 2008

机译：并行MPI / FORTRAN有限元对称/不对称域分解
6. Twenty-fold acceleration of 3D projection reconstruction MPI [O] . Justin J. Konkle, Patrick W. Goodwill, Emine Ulku Saritas, -1

机译：3D投影重建MPI的二十倍加速
7. MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes [O] . Anuj Sharma, Irene Moulitsas 2017

机译：MPI到Coarray Fortran：对非结构化网格的CFD求解器的经验

MPI vs Fortran coarrays beyond 100k cores: 3D cellular automata

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅