首页> 外文期刊>Scientific programming >Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study
【24h】

Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

机译:使用协数组并行化传统Fortran应用程序:策略和案例研究

获取原文
获取原文并翻译 | 示例

摘要

This paper summarizes a strategy for parallelizing a legacy Fortran 77 programusing the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were done using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.
机译:本文概述了一种策略,该策略使用分别在2003年和2008年标准中进入Fortran的面向对象(OO)和协同数组功能并行化旧版Fortran 77程序。 OO编程(OOP)有助于构建可扩展的模型验证和性能测试套件,以驱动开发。协同阵列并行编程有助于从串行应用程序快速发展为能够在共享和分布式内存中的多核处理器和多核加速器上运行的并行应用程序。我们描述了用于重构和并行化程序并研究结果性能的17个代码现代化步骤。我们的最初研究是使用32核共享内存服务器上的Intel Fortran编译器完成的。缩放行为非常差,使用TAU进行的性能分析表明,性能瓶颈是由于我们实施了集体的顺序求和程序而导致的。通过用并行二叉树算法代替顺序求和,我们能够提高可伸缩性并实现近乎线性的加速。我们还测试了Cray编译器,该编译器提供了自己的汇总求和过程。英特尔不提供任何集体减让。使用Cray,该程序即使在分布式内存执行中也可以显示线性加速。一旦其他编译器支持为Fortran 2015建议的新的集体程序,我们预计将获得类似的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号