A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing

Hironori Kasahara; Akimasa Yoshida

首页> 外文期刊>Parallel Computing >A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing

【24h】

A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing

机译：一种用于Fortran粗粒度并行处理的使用部分静态任务分配的数据本地化编译方案

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper proposes a compilation scheme for data localization using partial-static task assignment for Fortran coarse-grain parallel processing, or macro-dataflow processing, on a multiprocessor system with local memories and centralized shared memory. The data localization allows us to effectively use local memories and reduce data transfer overhead under dynamic task-scheduling environment. The proposed compilation scheme mainly consists of the following three parts : (1) loop-aligned decomposition, which decomposes each of the loops having data dependence among them into smaller loops, and groups the decomposed loops into data-localiz- able groups so that shared data among the decomposed loops inside each group can be passed via local memory and data transfer overhead among the groups can be minimum; (2) partial static task assignment, which gives information that the decomposed loops inside each data--localizable group are assigned to the same processor to a dynamic scheduling routine generator in the macro-dataf- low compiler; (3) parallel machine code generation, which generates parallel machine code to pass shared data inside the group through local memory and transfer data among groups through centralized shared memory. This compilation scheme has been implemented for a multiprocessor system, OSCAR (Optimally SCheduled Advanced multiprocessoR), having centralized shared memory and distributed shared memory, in addition to local memory on each processor. Performance evaluation of OSCAR shows that macro--dataflow processing with the proposed data--localization scheme can reduce the execution time by 20/100, in average, compared with macro--dataflow processing without data localization.

机译：本文提出了一种在局部内存和集中式共享内存的多处理器系统上使用局部静态任务分配进行Fortran粗粒度并行处理（或宏数据流处理）的数据本地化的编译方案。数据本地化使我们可以有效地使用本地内存，并减少动态任务调度环境下的数据传输开销。所提出的编译方案主要包括以下三个部分：（1）循环对齐分解，它将每个与数据相关的循环分解为较小的循环，并将分解后的循环分组为数据可本地化的组，以便共享每组内部的分解循环之间的数据可以通过本地内存传递，并且各组之间的数据传输开销可以最小化；（2）部分静态任务分配，该信息将信息分配给宏数据流编译器中的动态调度例程生成器，该信息将每个可本地化数据组中的分解循环分配给同一处理器；（3）并行机器代码生成，它生成并行机器代码，以通过本地存储器在组内传递共享数据，并通过集中式共享存储器在组之间传输数据。此编译方案已针对多处理器系统OSCAR（最佳调度高级多进程）实现，除每个处理器上的本地内存外，OSCAR还具有集中式共享内存和分布式共享内存。 OSCAR的性能评估表明，与没有数据本地化的宏数据流处理相比，采用建议的数据本地化方案的宏数据流处理平均可减少20/100的执行时间。

著录项

来源
《Parallel Computing》 |1998年第4期|p.579-596|共18页
作者
Hironori Kasahara; Akimasa Yoshida;
展开▼
作者单位

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Parallelizing compilers; Data localization; Automatic data distribution; Dynamic scheduling; Coarse--grain parallel processing;

机译：并行编译器;数据本地化;自动数据分发;动态调度;粗粒并行处理;

相似文献

外文文献
中文文献
专利

1. An overlapping task assignment scheme for hierarchical coarse-grain task parallel processing [J] . Akimasa Yoshida Concurrency and Computation . 2006,第11期

机译：用于分层粗粒度任务并行处理的重叠任务分配方案
2. A Data-localization Scheme among Loops for Each Layer in Hierarchical Coarse Grain Parallel Processing [J] . AKIMASA YOSHIDA, KENICHI KOSHIZUKA, MASAMI OKAMOTO 情報処理学会論文誌 . 1999,第5期

机译：分层粗粒度并行处理中各层循环之间的数据本地化方案
3. Domain-specific acceleration and auto-parallelization of legacy scientific code in FORTRAN 77 using source-to-source compilation [J] . Wim Vanderbauwhede, Gavin Davidson Computers & Fluids . 2018,第期

机译：使用源源到源编译，Fortran 77中的域特定加速度和自行并行化。
4. Data-localization for fortran macro-dataflow computation using partial static task assignment [C] . Akimasa Yoshida, Kenichi Koshizuka, Hironori Kasahara International conference on supercomputing . 1996

机译：使用部分静态任务分配的fortran宏数据流计算的数据本地化
5. Retargetable compilation for variable-grain data-parallel execution in image processing. [D] . Sander, Samuel Thomas. 2002

机译：用于图像处理中可变粒度数据并行执行的可重新定向编译。
6. Free Will Emerges From a Multistage Process of Target Assignment and Body-Scheme Recruitment for Free Effector Selection [O] . Bauke M. de Jong 2013

机译：自由意志从目标分配和身体方案征募的多阶段过程中产生，以选择自由的效应子。
7. Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation [O] . Vanderbauwhede, Wim, Davidson, Gavin 2017

机译：特定于域的加速和遗留的自动并行化使用源到源编译的FORTRaN 77中的科学代码
8. Increased UAV Task Assignment Performance Through Parallelized Genetic Algorithms (Preprint) [R] . Darrah, M. A. , Niland, W. M. , Stolarik, B. M. , 2006

机译：通过并行遗传算法提高无人机任务分配性能（预印本）

A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅